MMseqs2 icon indicating copy to clipboard operation
MMseqs2 copied to clipboard

Pipeline for getting taxonomy for clusters

Open alopgar opened this issue 11 months ago • 0 comments

Hi, I have been using MMseqs2 to obtain clusters of multiple sequence files and then obtain each sequence's taxonomy. I followed this pipeline:

mmseqs easy-cluster ${rawfas[@]} newcluster tmp --min-seq-id 0.3 -c 0.5 --cov-mode 1 --cluster-mode 2 -e 0.001 -s 6
mmseqs createdb ${rawfas[@]} queryDB_all
mmseqs taxonomy queryDB $TXDB clusterTax tmp --lca-mode 4 --split-memory-limit 60G \
     --lca-ranks superkingdom,phylum,class,order,family,genus
mmseqs createtsv queryDB clusterTax ../clusterTax.tsv

The output of these is a clusterRes_cluster.tsv file including the representative sequences and the cluster members, and a clusterTax file with the taxonomy for each sequence.

My question is, is there any MMseqs2 implementation to obtain the common taxonomy for each cluster, like an LCA algorithm applied to all the sequences belonging to each cluster, or something similar? Or another software that allows me to do that?

Thanks in advance

alopgar avatar Mar 01 '24 12:03 alopgar