Benjamin Buchfink
Benjamin Buchfink
It doesn't affect the function, you just need to be aware that they will be escaped as `\t` in the output.
> In your opinion, would such an approach lead to noticeable speed improvements. Depends on the size of your query files, I suggest testing it. > Is there a description...
This is due to the repeat masking, you need to use `--masking 0`.
There are some issues causing increased memory use that will be fixed in the next release. For now one thing you could try is using `--bin 256` (or possibly higher).
Another option would be `--cluster-steps faster_lin fast_lin`, that should be sufficient for 80% id cutoff.
Please try again with the latest release, memory use has been reduced.
DIAMOND is not configured to find very short hits by default. I shared some tips how to do this here: https://github.com/bbuchfink/diamond/issues/832
At the moment, you need to specify `qseqid`, not `cseqid`, on the command line. It is inconsistent and should be changed in a future version.
Please provide the command line you used to run diamond and your version.
These are not the files you provided. I ran diamond blastx of your `ncor_cdhit.fasta` against your `ncor_cdhit.fasta.transdecoder.pep` and it completed correctly.