aws-indexes icon indicating copy to clipboard operation
aws-indexes copied to clipboard

Missing plants in PlusPFP

Open bioreactordan opened this issue 2 years ago • 2 comments

Hi,

I'm using PlusPFP to identify algae species (plants). However, the algae species of interest (Chlorella sorokiniana) is not represented in the index even though it is listed in RefSeq with a full genome as of October 2022. Did you filter RefSeq for plants in any way that would remove certain genomes? I cross-checked RefSeq plant with the .txt file and some of the 2,663 organisms listed under the plant category in RefSeq are not in the index.

Thanks

bioreactordan avatar Jan 12 '23 17:01 bioreactordan

We noticed the same thing; Humulus lupulus (hops) is missing along with anything from the Humulus genome. Curious how this got skipped while Cannabis sativa is present?

mclaugsf avatar Apr 12 '23 14:04 mclaugsf

I can unravel at least 1 level of the mystery, which is that we use this file to determine what to download: https://ftp.ncbi.nlm.nih.gov/genomes/refseq/plant/assembly_summary.txt

And the mentioned genome seems not to be in that file.

BenLangmead avatar May 30 '23 16:05 BenLangmead