MMseqs2 icon indicating copy to clipboard operation
MMseqs2 copied to clipboard

some questions on MMseqs for Nucleic Acid Database Clustering

Open mintuos opened this issue 1 year ago • 0 comments

Expected Behavior

I want to cluster a nucleic acid database

Current Behavior

I have some question with mmseqs

  1. What is the difference between result2repseq and createseqfiledb
  2. --Are min seq-id and - c the same? If I want to cluster them based on 50% similarity, what should they be set to?

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders. I use this to cluster

mmseqs createdb test.fasta tmps/DB
mmseqs linclust tmps/DB tmps/DB_clu tmps --min-seq-id 0.90 --threads 96
mmseqs result2repseq tmps/DB tmps/DB_clu tmps/DB_clu_rep.fasta
mmseqs convert2fasta tmps/DB_clu_rep.fasta  outs.fasta

But I read the PDF guide manual on Github, which is

mmseqs cluster DB DB_clu tmp
mmseqs linclust DB DB_clu tmp
mmseqs createsubdb DB_clu DB DB_clu_rep
mmseqs convert2fasta DB_clu_rep DB_clu_rep.fasta

i wonder if their are same or not?

MMseqs Output (for bugs)

Please make sure to also post the complete output of MMseqs. You can use gist.github.com for large output.

their are two output

ls -l
total 1107472
-rw-rw-r-- 1 root root 1103346632 Nov 22 02:16 DB_clu_rep.fasta
-rw-rw-r-- 1 root root          4 Nov 22 02:16 DB_clu_rep.fasta.dbtype
lrwxrwxrwx 1 root root         32 Nov 22 02:16 DB_clu_rep.fasta_h -> /data/codonOP/50filter/tmps/DB_h
lrwxrwxrwx 1 root root         39 Nov 22 02:16 DB_clu_rep.fasta_h.dbtype -> /data/codonOP/50filter/tmps/DB_h.dbtype
lrwxrwxrwx 1 root root         38 Nov 22 02:16 DB_clu_rep.fasta_h.index -> /data/codonOP/50filter/tmps/DB_h.index
-rw-rw-r-- 1 root root   30698239 Nov 22 02:16 DB_clu_rep.fasta.index
lrwxrwxrwx 1 root root         37 Nov 22 02:16 DB_clu_rep.fasta.lookup -> /data/codonOP/50filter/tmps/DB.lookup
lrwxrwxrwx 1 root root         37 Nov 22 02:16 DB_clu_rep.fasta.source -> /data/codonOP/50filter/tmps/DB.source
ls -l
total 1107476
-rw-rw-r-- 1 root root 1103346632 Nov 22 02:16 DB_clu_rep
-rw-rw-r-- 1 root root          4 Nov 22 02:16 DB_clu_rep.dbtype
lrwxrwxrwx 1 root root         32 Nov 22 02:16 DB_clu_rep_h -> /data/codonOP/50filter/tmps/DB_h
lrwxrwxrwx 1 root root         39 Nov 22 02:16 DB_clu_rep_h.dbtype -> /data/codonOP/50filter/tmps/DB_h.dbtype
lrwxrwxrwx 1 root root         38 Nov 22 02:16 DB_clu_rep_h.index -> /data/codonOP/50filter/tmps/DB_h.index
-rw-rw-r-- 1 root root   30698565 Nov 22 02:16 DB_clu_rep.index
lrwxrwxrwx 1 root root         37 Nov 22 02:16 DB_clu_rep.lookup -> /data/codonOP/50filter/tmps/DB.lookup
lrwxrwxrwx 1 root root         37 Nov 22 02:16 DB_clu_rep.source -> /data/codonOP/50filter/tmps/DB.source

Your Environment

i use conda to install mmseqs

mintuos avatar Nov 22 '23 02:11 mintuos