Cluster with greedy set cover method not reproducible.
Expected Behavior
Clusters should have same number of members. Is this expected? The options I used are shown below. Is there a method for reproducible results? I tried both connected components and greedy set cover. Clusters from connected components are more consistent, but they still vary.
Current Behavior
Cluster sizes changes after every run.
Steps to Reproduce (for bugs)
Python code:
createdb_cmd = str(mmseqs) + ' createdb ' + \
str(fasta_file) + ' ' + str(db_path) + ' --shuffle 0'
system(createdb_cmd)
...
cluster_args = ['--min-seq-id', str(min_seq_id),
'--cluster-mode', str(cluster_mode),
'-c', str(c),
'-s', str(s),
'--alignment-mode', str(alignment_mode),
str(query_db), str(clu_db_path), str(out_path)]
cluster_cmd = str(mmseqs) + ' cluster ' + ' '.join(cluster_args)
print(cluster_cmd)
system(cluster_cmd)
...
tsv_args = [str(query_db), str(query_db), str(clu_db_path), str(clu_tsv_path)]
tsv_cmd = mmseqs + ' createtsv ' + ' '.join(tsv_args)
MMseqs Output (for bugs)
Context
Your Environment
MMseqs2 Version: f349118312919c4fcc448f4595ca3b3a387018e2 Ubuntu 20.04, WSL2
The cluster order can be different in the output file. However the cluster itself should have the same members. Are the members changing or just the cluster order?
Thank you for the response. I manually checked the results, and I was wrong about the connected components, which does appear to produce the same clusters. However, the greedy set cover clusters are changing the cluster size. However, now realize that is expected.