diamond
diamond copied to clipboard
random output using long read mode
Hi,
I noticed that sometime the results is not reproducible using the long read mode, I wonder is it related to the internal seed or tie breaking?
Here is a minimal example:
diamond blastx
--db protein.txt
--query query.txt
--outfmt 6 qseqid sseqid pident length qlen qstart qend slen sstart send evalue bitscore
--range-culling -F 15
--max-target-seqs 25 --max-hsps 0
--threads 16
First run:
JAHHUZ010000004.1 WP_291717584.1 89.9 119 158376 144690 145046 119 1 119 1.31e-68 216
JAHHUZ010000004.1 WP_106008235.1 89.1 119 158376 13687 13331 119 1 119 1.78e-68 216
Second run:
JAHHUZ010000004.1 WP_291717584.1 89.9 119 158376 13687 13331 119 1 119 1.31e-68 216
JAHHUZ010000004.1 WP_106008235.1 89.1 119 158376 13687 13331 119 1 119 1.78e-68 216
Third run:
JAHHUZ010000004.1 WP_291717584.1 89.9 119 158376 144690 145046 119 1 119 1.31e-68 216
JAHHUZ010000004.1 WP_106008235.1 89.1 119 158376 144690 145046 119 1 119 1.78e-68 216
Update: this seem to be a corner case, the reverse complement of the sequence is the same as the original sequence:
seqkit grep -p 'JAHHUZ010000004.1' query.txt > contig.txt
seqtk seq contig.txt > forward.txt
seqtk seq -r contig.txt > reverse.txt
md5sum forward.txt reverse.txt
89c390f039087b7265e794b35912e0ba forward.txt
89c390f039087b7265e794b35912e0ba reverse.txt
There seems to be a non-determinism in reporting equally scoring hits. I will fix this when I get the chance.