Milot Mirdita comments

Results 429 comments of


Milot Mirdita

Question: How to krona plot from easy-taxonomy

You have to specify the `--report-mode 1` parameter to generate Krona output. Splitting `easy-taxonomy` up makes generating both outputs at the same time easier: ``` mmseqs createdb 00.rawdata/ccs.fasta 00.rawdata/M1.ccs.fasta qdb...

Creating index for ColabDB failed on cluster.

You need a machine with 1TB of ram to create a pre computed index for the ColabFoldDB. Are you actually planning to run a lot of small queries (like the...

Creating index for ColabDB failed on cluster.

Then I would recommend to delete the already created precomputed index (`rm *.idx*`) and just use `colabfold_search` without the precomputed index.

Why divergent sequences cluster together using mmseqs2 easy-cluster?

This was also a while ago, however for clustering you should pretty much always supply a sequence identity threshold with `--min-seq-id`. The cascaded clustering of MMseqs2 can still put together...

All-vs-All alignments with fake prefilter give unexpected sequence identities

That seems about right? It aligned two residues successfully (from 83 to 84). You might want to demand some coverage thresholds (`-c/-cov-mode`) or a minimum aln length threshold (`--min-aln-len`).

Avoid sequence id parsing

#557 is the same issue. We'll think of something.

Running linclust on NFS

`--db-load-mode` won't help in this case. The parameter handles loading of precomputed indices of (search) databases. Normally, we don't use precomputed indices for clustering. Ideally the `tmp` folder should be...

Linclust fails to cluster sequences with a single mismatch

I am not sure we deal well with 50-mers, the default nucleotide k-mer size is 14 or 15 (depending on the database size). Also, we have predefined spaced-kmer patterns only...

CMakeLists.txt: Properly handle cpu flags

I thought that `-mavx2` would imply (most) lower SSE levels. We also use an SSSE3 instruction in some important place (iirc), so should we also enable that explicitly? (Edit: we...

mmseqs easy-cluster stuck at prefilter stage for multiple days

This was also a while ago. For your use-case I would only call `easy-linclust`. You won't benefit from the deeper clustering at a seq. id. threshold of 98%. That should...