NCBIminer: sequences harvest from Genbank

NCBIminer is freely available, cross-platform and user-friendly software for mining nucleotide sequence data from GenBank. It has several features that enable users to accurately and efficiently download sequences with specific attributes from the GenBank database: 1) it uses a novel search strategy, and can download sequences for distantly related taxonomic groups with high accuracy; 2) it deals with genes, CDS, rRNA, and other GenBank-defined feature types; 3) it can filter sequences by length and similarities with the reference sequence using user-defined parameters; 4) it can download information on DNA sample collections, e.g. voucher specimen, country, latitude and longitude, and collector; 5) it takes advantage of parallelization for a high efficiency workflow. We demonstrate the use and performance of NCBIminer by downloading sequences for the plant family Campanulaceaes. Compared to other methods, NCBIminer harvests more and longer sequences, and is less sensitive to query sequences.