MMseqs2
MMseqs2 copied to clipboard
Message "Error: there must be an error: ~" appears when mmseqs cluster/easy-cluster is used
Dear developers,
I am trying to perform clustering on the set of closely-related sequences of retrotransposons (n = 736,771). The module "easy-linclust" works fine for this dataset. But whenever I used the "cluster" module (and also the "easy-cluster" module), I got a long list of error messages such as the following.
Error: there must be an error: 98829 deleted from 373971 that now is empty, but not assigned to a cluster
I am unsure why these error messages occur and whether they could affect my clustering results or not. Could you please help me to solve this issue? Thank you in advance.
MMseqs version: 96d452cb432fc4674991a48952deaf24d1787e77 (self-compiled) Full log: full.log
Hello, I get the same error when trying the clustering of DNA sequences extracted from prokka annotation (.ffn).
the command is :
$ mmseqs easy-cluster pacbio_assembly_protein_DNA_dereplicated.ffn clusterPacbio temp --min-seq-id 0.9
The error messages are after this step :
"
[=================================================================] 100.00% 245.49K 0s 110ms
Add missing connections
[=================================================================] 100.00% 245.49K 0s 7ms
Time for read in: 0h 0m 0s 279ms there must be an error: 138172 deleted from 117856 that now is empty, but not assigned to a cluster there must be an error: 139364 deleted from 144033 that now is empty, but not assigned to a cluster there must be an error: 119199 deleted from 119499 that now is empty, but not assigned to a cluster there must be an error: 119078 deleted from 137716 that now is empty, but not assigned to a cluster there must be an error: 120853 deleted from 137082 that now is empty, but not assigned to a cluster there must be an error: 17414 deleted from 150027 that now is empty, but not assigned to a cluster there must be an error: 119826 deleted from 117153 that now is empty, but not assigned to a cluster there must be an error: 117154 deleted from 119723 that now is empty, but not assigned to a cluster "
and before Clestering step 1.
Those these errors affects the quality of the clustering ?
Thanks for your time.
Aline
MMseqs Version: 5b03cdff7a91206bfd5db82b5b2f23bd6c8f0813
Complete log : log_pacbio_error.txt
The compressed 'pacbio_assembly_protein_DNA_dereplicated.ffn' input file can be found here https://enacshare.epfl.ch/dqTYpbvUuJiCnsktFWyfN
I have similar problem when using mmseqs cluster (version 13.45111) to clustering RNA sequence downloaded from http://ftp.ebi.ac.uk/pub/databases/RNAcentral/current_release/sequences/by-database/rfam.fasta, while mmseqs linclust seems work fine... Really appreciate it if the developer could take a look ...
I had a similar problem trying to cluster RNA sequences, I resolved the issue by switching from --cov-mode 0 to --cov-mode 1 and cluster-mode 2 (facilitates clustering of fragments with larger transcripts). Full set of options specified were: --cov-mode 1 -c 0.80 --cluster-mode 2 --min-seq-id 0.99