diamond icon indicating copy to clipboard operation
diamond copied to clipboard

random output using long read mode

Open xinehc opened this issue 2 years ago • 1 comments

Hi,

I noticed that sometime the results is not reproducible using the long read mode, I wonder is it related to the internal seed or tie breaking?

Here is a minimal example: diamond blastx
--db protein.txt
--query query.txt
--outfmt 6 qseqid sseqid pident length qlen qstart qend slen sstart send evalue bitscore
--range-culling -F 15
--max-target-seqs 25 --max-hsps 0
--threads 16

First run:

JAHHUZ010000004.1	WP_291717584.1	89.9	119	158376	144690	145046	119	1	119	1.31e-68	216
JAHHUZ010000004.1	WP_106008235.1	89.1	119	158376	13687	13331	119	1	119	1.78e-68	216

Second run:

JAHHUZ010000004.1	WP_291717584.1	89.9	119	158376	13687	13331	119	1	119	1.31e-68	216
JAHHUZ010000004.1	WP_106008235.1	89.1	119	158376	13687	13331	119	1	119	1.78e-68	216

Third run:

JAHHUZ010000004.1	WP_291717584.1	89.9	119	158376	144690	145046	119	1	119	1.31e-68	216
JAHHUZ010000004.1	WP_106008235.1	89.1	119	158376	144690	145046	119	1	119	1.78e-68	216

protein.txt query.txt

Update: this seem to be a corner case, the reverse complement of the sequence is the same as the original sequence:

seqkit grep -p 'JAHHUZ010000004.1' query.txt > contig.txt
seqtk seq contig.txt > forward.txt
seqtk seq -r contig.txt > reverse.txt
md5sum forward.txt reverse.txt

89c390f039087b7265e794b35912e0ba  forward.txt
89c390f039087b7265e794b35912e0ba  reverse.txt

xinehc avatar Aug 08 '23 03:08 xinehc

There seems to be a non-determinism in reporting equally scoring hits. I will fix this when I get the chance.

bbuchfink avatar Aug 21 '23 13:08 bbuchfink