MMseqs2 icon indicating copy to clipboard operation
MMseqs2 copied to clipboard

Cluster with greedy set cover method not reproducible.

Open eric-tc-wong opened this issue 4 years ago • 2 comments

Expected Behavior

Clusters should have same number of members. Is this expected? The options I used are shown below. Is there a method for reproducible results? I tried both connected components and greedy set cover. Clusters from connected components are more consistent, but they still vary.

Current Behavior

Cluster sizes changes after every run.

Steps to Reproduce (for bugs)

Python code:

    createdb_cmd = str(mmseqs) + ' createdb ' + \
        str(fasta_file) + ' ' + str(db_path) + ' --shuffle 0'
    system(createdb_cmd)

...

    cluster_args = ['--min-seq-id', str(min_seq_id),
                    '--cluster-mode', str(cluster_mode),
                    '-c', str(c),
                    '-s', str(s),
                    '--alignment-mode', str(alignment_mode),
                    str(query_db), str(clu_db_path), str(out_path)]
    cluster_cmd = str(mmseqs) + ' cluster ' + ' '.join(cluster_args)
    print(cluster_cmd)
    system(cluster_cmd)

...
    tsv_args = [str(query_db), str(query_db), str(clu_db_path), str(clu_tsv_path)]
    tsv_cmd = mmseqs + ' createtsv ' + ' '.join(tsv_args)

MMseqs Output (for bugs)

Context

Your Environment

MMseqs2 Version: f349118312919c4fcc448f4595ca3b3a387018e2 Ubuntu 20.04, WSL2

eric-tc-wong avatar Aug 26 '21 00:08 eric-tc-wong

The cluster order can be different in the output file. However the cluster itself should have the same members. Are the members changing or just the cluster order?

martin-steinegger avatar Aug 27 '21 02:08 martin-steinegger

Thank you for the response. I manually checked the results, and I was wrong about the connected components, which does appear to produce the same clusters. However, the greedy set cover clusters are changing the cluster size. However, now realize that is expected.

eric-tc-wong avatar Aug 28 '21 02:08 eric-tc-wong