MMseqs2
MMseqs2 copied to clipboard
Pipeline for getting taxonomy for clusters
Hi, I have been using MMseqs2 to obtain clusters of multiple sequence files and then obtain each sequence's taxonomy. I followed this pipeline:
mmseqs easy-cluster ${rawfas[@]} newcluster tmp --min-seq-id 0.3 -c 0.5 --cov-mode 1 --cluster-mode 2 -e 0.001 -s 6
mmseqs createdb ${rawfas[@]} queryDB_all
mmseqs taxonomy queryDB $TXDB clusterTax tmp --lca-mode 4 --split-memory-limit 60G \
--lca-ranks superkingdom,phylum,class,order,family,genus
mmseqs createtsv queryDB clusterTax ../clusterTax.tsv
The output of these is a clusterRes_cluster.tsv file including the representative sequences and the cluster members, and a clusterTax file with the taxonomy for each sequence.
My question is, is there any MMseqs2 implementation to obtain the common taxonomy for each cluster, like an LCA algorithm applied to all the sequences belonging to each cluster, or something similar? Or another software that allows me to do that?
Thanks in advance