metaeuk icon indicating copy to clipboard operation
metaeuk copied to clipboard

Taxdatabase error

Open H1889 opened this issue 3 years ago • 13 comments

Hi,

First of all, thank you for your nice Metauk pipeline, I have installed it with Uniref100 database following the instructions at github. However when I run the contigtotax command an error is shown:

"Database UniRef100NoQUE need taxonomical information. The UniRef100NoQUE_nodes.dmp is missing."

However the UniRef100NoQUE_taxonomy file is there.

These are my files at my working directory where I run metaeuk:

tmp UniRef100NoQUE_taxonomy UniRef100NoQUE_mapping UniRef100NoQUE_h.index UniRef100NoQUE_h.dbtype UniRef100NoQUE_h UniRef100NoQUE.source UniRef100NoQUE.lookup UniRef100NoQUE.index UniRef100NoQUE.dbtype UniRef100NoQUE UniRef100_taxonomy UniRef100_mapping UniRef100_h.index UniRef100_h.dbtype UniRef100_h UniRef100.version UniRef100.source UniRef100.lookup UniRef100.index UniRef100.dbtype UniRef100 QUEMETAEUK.headersMap.tsv QUEMETAEUK.fas QUEMETAEUK.codon.fas

My command was: metaeuk taxtocontig tmp/latest/contigs QUEMETAEUK.fas QUEMETAEUK.headersMap.tsv UniRef100NoQUE QUEMETAEUK_TAX tmp --majority 0.5 --tax-lineage 1 --lca-mode 2

Surely I am making some mistake but I can not see it,I would be very grateful if you could help me to solve it.

H1889 avatar Aug 29 '21 11:08 H1889

Hi,

Thank you for using MetaEuk. From the files in your directory it seems that indeed the nodes file is missing as the software complains. How did you obtain that database? Could you please try obtaining it by running metaeuk databases UniRef100 UniRef100DB tmp and using the downloaded UniRef100DB? It should be accompanied with all required files.

elileka avatar Aug 29 '21 12:08 elileka

Thanks for the answer, These were my commnads: mmseqs databases UniRef100 UniRef100 tmp mmseqs createtaxdb UniRef100 tmp

But *_taxonomy file is a binary file with all requiered *.dmp files, is not it?

H1889 avatar Aug 29 '21 12:08 H1889

From mmseq2 help: An MMseqs2 database seqTaxDB is a sequence database augmented with taxonomic information and a mapping file from each database key to its taxon id. Such a database includes the following files: seqTaxDB, seqTaxDB.index, seqTaxDB.dbtype, seqTaxDB.lookup, seqTaxDB_h, seqTaxDB_h.index, seqTaxDB_h.dbtype, seqTaxDB_mapping and either the taxonomy flat file databases seqTaxDB_nodes.dmp, seqTaxDB_names.dmp, seqTaxDB_merged.dmp or seqTaxDB_taxonomy a binary version of the former files (created by createtaxdb which reduces the read-in time of the taxonomy database).

H1889 avatar Aug 29 '21 12:08 H1889

Are you using the latest release or the compiling from the latest git code? The issue might have been fixed already in git.

milot-mirdita avatar Aug 29 '21 12:08 milot-mirdita

You can download precompiled binaries from the latest git code here: https://mmseqs.com/metaeuk/ Could you please check if the error message persists with this binary?

milot-mirdita avatar Aug 29 '21 12:08 milot-mirdita

I just checked, we indeed fixed the issue already in git. We will prepare a new release with the fix.

~~In the meantime, you can do:~~

#touch UniRef100NoQUE_nodes.dmp
#touch UniRef100NoQUE_names.dmp
#touch UniRef100NoQUE_merged.dmp

~~This will defeat the check, but it will still prefer the functional _taxonomy file over the dummy .dmp files.~~

Edit: Nevermind, I think you used a databases that was created with a newer MMseqs2 version with MetaEuk. In general that should work, but is not guaranteed to. If you recreate the database with the metaeuk databases call it should also work.

milot-mirdita avatar Aug 29 '21 13:08 milot-mirdita

I installed it with conda, both mmseqs2 and metaeuk

H1889 avatar Aug 29 '21 14:08 H1889

my installed versions: MMseqs2 Version: 13.45111 metaeuk Version: 4.a0f584d

H1889 avatar Aug 29 '21 14:08 H1889

Given I have downloaded the Uniref100, can I use it? If a I run metaeuk databases UniRef100 UniRef100DB tmp it starts to dowload it again

H1889 avatar Aug 29 '21 14:08 H1889

You can manually download the ncbi taxdump file: https://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz Extract the names.dmp nodes.dmp merged.dmp files and place them under

UniRef100DB_names.dmp
UniRef100DB_nodes.dmp
UniRef100DB_merged.dmp

milot-mirdita avatar Aug 29 '21 14:08 milot-mirdita

Thank you, that worked. A final question, my assembly has 23000 contigs, however only 12000 have been tax annotated by the contigtotax command, that means that metauk can not assign a tax to 11000 contigs?

H1889 avatar Aug 30 '21 10:08 H1889

The *_tax_per_contig.tsv has exactly 12413 contigs however the input metagenome had 23344

H1889 avatar Aug 30 '21 10:08 H1889

Thank you, that worked. A final question, my assembly has 23000 contigs, however only 12000 have been tax annotated by the taxtocontig command, that means that metaeuk can not assign a tax to 11000 contigs?

Yes, exactly.

elileka avatar Aug 30 '21 14:08 elileka