diamond icon indicating copy to clipboard operation
diamond copied to clipboard

Error after making Diamond db of IMG/VR dataset

Open poursalavati opened this issue 2 years ago • 3 comments

Hello and thanks for developing this Diamond!!

I am currently in the process of preparing a Diamond database from IMG/VR data. I have already prepared the necessary files and successfully created the name and node files, as well as the taxid.map file.

I have attached the name and node files for your reference. The taxid.map file is quite large, but I have included the head of it in the links below: names.dmp.txt nodes.dmp.txt head_id2tax.txt

Diamond making db:

diamond makedb --in ../IMGVR_all_proteins.faa --db IMGVR_Taxoned --taxonmap example/taxdump/taxid-h.map --taxonnodes example/taxdump/nodes.dmp --taxonnames example/taxdump/names.dmp -p 40

Its end of diamond stdout:

Loading taxonomy names...  [0.021s]
Loaded taxonomy names for 0 taxon ids.
Loading taxonomy mapping file...  [355.924s]
Joining accession mapping...  [82.928s]
Writing taxon id list...  [2.292s]
Building taxonomy nodes...  [11.996s]
2147258865 taxonomy nodes processed.
Number of nodes assigned to rank:
no rank           2147251888
superkingdom      6
kingdom           10
subkingdom        0
superphylum       0
phylum            17
subphylum         0
superclass        39
class             0
subclass          0
infraclass        0
cohort            0
subcohort         0
superorder        0
order             64
suborder          0
infraorder        0
parvorder         0
superfamily       0
family            206
subfamily         0
tribe             0
subtribe          0
genus             2116
subgenus          0
section           0
subsection        0
series            0
species group     0
species subgroup  0
species           4519
subspecies        0
varietas          0
forma             0
strain            0
biotype           0
clade             0
forma specialis   0
genotype          0
isolate           0
morph             0
pathogroup        0
serogroup         0
serotype          0
subvariety        0

Closing the input file...  [0s]
Closing the database file...  [0.015s]

Database sequences                   220799163
Database letters                     49459660621
Accessions in database               220799163
Entries in accession to taxid file   216984561
Database accessions mapped to taxid  0
Database sequences mapped to taxid   0
Database hash                        58748cfe915c91e69a43a88c27aa3e8b
Total time                           2155s

I think the issue could be that I have a large number of sequences for each taxonomy id. And it seems that only one of each has been identified (or indexed) by Diamond. for example:

accession.version	taxid
IMGVR_UViG_638276111_000001|638276111|638297712 541518477
IMGVR_UViG_638276111_000001|638276111|638297713 541518477
IMGVR_UViG_638276111_000001|638276111|638297714 541518477
IMGVR_UViG_638276111_000001|638276111|638297715 541518477
IMGVR_UViG_638276111_000001|638276111|638297716 541518477
IMGVR_UViG_638276111_000001|638276111|638297717 541518477
IMGVR_UViG_638276111_000001|638276111|638297718 541518477
IMGVR_UViG_638276111_000001|638276111|638297719 541518477
IMGVR_UViG_638276111_000001|638276111|638297720 541518477
IMGVR_UViG_638276111_000001|638276111|638297721 541518477
IMGVR_UViG_638276111_000001|638276111|638297722 541518477
IMGVR_UViG_638276111_000001|638276111|638297723 541518477

Hope there is some way to fix it! I would greatly appreciate your assistance if you have any suggestions or solutions.

poursalavati avatar Mar 31 '23 21:03 poursalavati

Update: I've explained the reason and solution here, in case it might solve someone else's issue.

Good luck, NP

poursalavati avatar Apr 01 '23 14:04 poursalavati

Sorry for this unfortunate issue, this was implemented to handle old NCBI headers. I think at least a warning message should be given in these cases.

bbuchfink avatar Apr 12 '23 09:04 bbuchfink

The latest release now prints a warning message about this when you run makedb.

bbuchfink avatar May 31 '23 12:05 bbuchfink