Milot Mirdita comments

Results 432 comments of


Milot Mirdita

BLAST tab format does not report gap openings (potentially wrong bit score)

I think the important part is to set `-a`. Me mentioning `--alignment-mode` was a bit misleading. The gap open count is computed based on the presence of a backtrace, which...

[Question] How do I cluster Uniref90 after it has been translated into a 5 letter alphabet?

That's an interesting application of MMseqs2's clustering. It should be possible to do what you want however it will require much more parameter tweaking. Also did you generate your own...

[Question] How do I cluster Uniref90 after it has been translated into a 5 letter alphabet?

No guarantees but that's why I would first try. Try k-mer sizes from 6 to 15. More things might go wrong though, as you are breaking some pretty fundamental assumptions....

[Question] How do I cluster Uniref90 after it has been translated into a 5 letter alphabet?

Changing the alphabet size will cause it to use MMseqs2's built-in alphabet reduction. Since you seem to be trying various reduced alphabets I assume that you don't want to it...

[Question] What are the requirements for adding taxonomy information to a MMSEQS2 database?

Currently, everything is tailored to the NCBI taxonomy format (taxdump). For GTDB we transform their taxonomy to a names/nodes.dmp format). If your taxonomy is NCBI based, then you can just...

[Question] What are the requirements for adding taxonomy information to a MMSEQS2 database?

taxids are numeric. Your tax tree in the names and nodes.dmp needs to have full lineages up to the tree and also not have cycles. The labels themselves are not...

[Question] What are the requirements for adding taxonomy information to a MMSEQS2 database?

Yes, you can point `createtaxdb` to your existing database with `--ncbi-tax-dump` and `--tax-mapping-file` as described above. In fact that's how the `databases` commands work, they download sequences, create a db...

[Question] What are the requirements for adding taxonomy information to a MMSEQS2 database?

mapping is empty sounds like something went wrong while creating this tsv files I mentioned. Could you please write down the steps you took to generate the tax mapping?

[Question] What are the requirements for adding taxonomy information to a MMSEQS2 database?

How did you creat taxid.map? How does it look like?

[Question] What are the requirements for adding taxonomy information to a MMSEQS2 database?

Okay that might be correct, how does `MicroEuk100.eukaryota_odb10.lookup` look like?