diamond
diamond copied to clipboard
Failed building databases using GTDB-taxdump taxonomy files
I was trying to build a database using the taxonomy files from gtdb-taxdump, however it failed when reading the names.dmp with the following message:
zcat gtdb_proteomes/* | diamond makedb --db gtdb --taxonnames gtdb-taxdump/R207/names.dmp --taxonnodes gtdb_data/gtdb-taxdump/R207/nodes.dmp --taxonmap gtdb.protein.taxid.map
diamond v2.0.15.153 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)
#CPU threads: 32
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Input file parameter (--in) is missing. Input will be read from stdin.
Opening the database file... [0s]
Loading sequences... [0.772s]
Masking sequences... [0.157s]
Writing sequences... [0.035s]
Writing accessions... [0.072s]
Hashing sequences... [0.013s]
Loading sequences... [0s]
Writing trailer... [0.003s]
Loading taxonomy nodes... [28.213s]
Loading taxonomy names... [78.105s]
Failed to allocate sufficient memory. Please refer to the manual for instructions on memory usage.
Here is an example of the names.dmp from gtdb-taxdump
head -20 gtdb-taxdump/R207/names.dmp
1 | root | | scientific name |
13926 | 001393675 | | scientific name |
14375 | RUG14239 sp902797145 | | scientific name |
17689 | 001423155 | | scientific name |
20514 | 018334475 | | scientific name |
23859 | 013185635 | | scientific name |
34402 | 002214285 | | scientific name |
38289 | 001509495 | | scientific name |
66445 | 009903045 | | scientific name |
74747 | 000419015 | | scientific name |
78978 | 014222245 | | scientific name |
85313 | 001742655 | | scientific name |
88808 | E44-bin52 sp004375875 | | scientific name |
121310 | 001585965 | | scientific name |
138721 | VXYK01 | | scientific name |
147972 | 007121265 | | scientific name |
151528 | 007830495 | | scientific name |
157756 | 003411905 | | scientific name |
160336 | 002878095 | | scientific name |
173955 | 001247185 | | scientific name |
Do you have any idea why this could be happening? I haven't had any problems building databases with the NCBI-taxdumps.
Best regards, Emil Hägglund
The taxids used in these files are > 2^31, that is not supported at the moment. I'll see what I can do about this.
Ah, suspected it was something like this. Then I know the cause of the error. Thanks for the quick reply!