MMseqs2 icon indicating copy to clipboard operation
MMseqs2 copied to clipboard

easy-taxonomy for contigs

Open TimothyStephens opened this issue 8 months ago • 5 comments

Not sure if this is a bug or if I am missing a flag that would make this all work as expected.

Expected Behavior

I wish to taxonomically annotate contigs using the mmseqs easy-taxonomy workflow. I see from your documentation (https://github.com/soedinglab/MMseqs2/wiki#taxonomy-output-and-tsv) that it is possible to calculate the LCA of a contig predicted ORFs. With the output file produced listing the contig_name along with the total number of predicted ORFs and the number of those ORFs with top hits that agree with the assigned LCA of the contig.

Current Behavior

When I run the following command:

mmseqs easy-taxonomy contigs.fasta swissprotDB tax tmp

I get the following results files:

tax_lca.tsv
tax_report
tax_tophit_aln
tax_tophit_report

None of which contain the expected output described in the documentation.

I have had a look at using aggregatetax command, but run into problem with the createtsv command not reassigning the contig names correctly.

mmseqs createdb contigs.fasta contigsDb
mmseqs extractorfs contigsDb orfsAaDb --translate 1
mmseqs taxonomy orfsAaDb swissprotDB taxPerOrf tmp --tax-output-mode 2
mmseqs aggregatetaxweights swissprotDB orfsAaDb_h taxPerOrf taxPerOrf_aln taxPerContig --majority 0.5
mmseqs createtsv orfsAaDb contigsDb taxPerContig aggregatetaxResult.tsv

Your Environment

MMseqs2 Version: 113e3212c137d026e297c7540e1fcd039f6812b1
Pre-compiled binary

Thanks for your help in advance.

Cheers, Tim.

TimothyStephens avatar Jun 16 '24 21:06 TimothyStephens