MMseqs2
MMseqs2 copied to clipboard
easy-taxonomy for contigs
Not sure if this is a bug or if I am missing a flag that would make this all work as expected.
Expected Behavior
I wish to taxonomically annotate contigs using the mmseqs easy-taxonomy
workflow.
I see from your documentation (https://github.com/soedinglab/MMseqs2/wiki#taxonomy-output-and-tsv) that it is possible to calculate the LCA of a contig predicted ORFs. With the output file produced listing the contig_name along with the total number of predicted ORFs and the number of those ORFs with top hits that agree with the assigned LCA of the contig.
Current Behavior
When I run the following command:
mmseqs easy-taxonomy contigs.fasta swissprotDB tax tmp
I get the following results files:
tax_lca.tsv
tax_report
tax_tophit_aln
tax_tophit_report
None of which contain the expected output described in the documentation.
I have had a look at using aggregatetax
command, but run into problem with the createtsv
command not reassigning the contig names correctly.
mmseqs createdb contigs.fasta contigsDb
mmseqs extractorfs contigsDb orfsAaDb --translate 1
mmseqs taxonomy orfsAaDb swissprotDB taxPerOrf tmp --tax-output-mode 2
mmseqs aggregatetaxweights swissprotDB orfsAaDb_h taxPerOrf taxPerOrf_aln taxPerContig --majority 0.5
mmseqs createtsv orfsAaDb contigsDb taxPerContig aggregatetaxResult.tsv
Your Environment
MMseqs2 Version: 113e3212c137d026e297c7540e1fcd039f6812b1
Pre-compiled binary
Thanks for your help in advance.
Cheers, Tim.