kg-microbe icon indicating copy to clipboard operation
kg-microbe copied to clipboard

filter out viruses from NCBI Taxonomy 'unclassified'

Open realmarcin opened this issue 1 year ago • 1 comments

These viruses are found in the ncbitaxon_removed_subset.json:

"val" : "Cotton leaf curl Rajasthan virus betasatellite defective interfering DNA" "lbl" : "Cotton leaf curl Rajasthan virus defective interfering DNA", "lbl" : "Cotton leaf curl virus betasatellite defective interfering DNA", "lbl" : "Hygrophorus parvirussula", "lbl" : "unidentified Cotton leaf curl Rajasthan virus-associated DNA",

realmarcin avatar Feb 26 '24 20:02 realmarcin

@bsantan let's think about how to add this to the transform code. I believe filtering on no 'virus' or 'phage' in the reference proteome names will work. We can say we assume that no multicellular organism is 'unclassified' -- though this may not be entirely true. The first pass/test transform could just exclude anything from 'unclassified'.

realmarcin avatar Feb 27 '24 01:02 realmarcin