nonpareil
nonpareil copied to clipboard
Not reproducible results
Hello,
I have a question regarding the reproducibility of the results: I ran nonpareil
on the same input using the same command line and got slightly different results for both runs.
Is that something to be expected? Do you know what the source of this randomness is and whether the analysis could be made deterministic in the future?
Used version: nonpareil=3.3.3=r341h470a237_0
installed via conda
Thank you in advance!
Best, Valentina
Coming into this a bit late, but there is a random seed setting as one of the parameters and sampling is mentioned in the documentation, so I think this is both completely expected and possible to make reproducible by setting -r
to the same seed between runs:
-r <int> | Random generator seed. By default current time.
Dear @cjfields,
Could you clarify how you use the -r
option and with which version of the tool?
Also, I do not see this option listed when running nonpareil -h
- neither in the mentioned version 3.3.3
nor in the latest one (3.3.4
).
I tried to run version v3.303
(3.3.3
, r341h470a237_0
) with the option -r
set using the same command two times. The output from the two runs has different md5sum
s and different content as well - except for the *.npl
files which don't contain any relevant output anyway.
Here are the commands I executed:
nonpareil -s some.reads.fq -T kmer -f fastq -r 23 -b test.1
nonpareil -s some.reads.fq -T kmer -f fastq -r 23 -b test.2
Thanks for bringing this up to our attention! I have now implemented consistency with -r
when using -T alignment
. Note that it may still produce slightly different results with different numbers of threads (-t
).
For -T kmer
, we use an implementation of random_device
, so it needs a little more work.
@gunturus Do you think the kmer kernel could be migrated to a deterministic implementation instead?
Dear @cjfields,
Could you clarify how you use the
-r
option and with which version of the tool? Also, I do not see this option listed when runningnonpareil -h
- neither in the mentioned version3.3.3
nor in the latest one (3.3.4
).I tried to run version
v3.303
(3.3.3
,r341h470a237_0
) with the option-r
set using the same command two times. The output from the two runs has differentmd5sum
s and different content as well - except for the*.npl
files which don't contain any relevant output anyway.Here are the commands I executed:
nonpareil -s some.reads.fq -T kmer -f fastq -r 23 -b test.1 nonpareil -s some.reads.fq -T kmer -f fastq -r 23 -b test.2
Happy to see @lmrodriguezr 's answer (and agree that it's good you raised it); I planned on replying that this sounds like a definite bug.
Dear @lmrodriguezr and @cjfields,
Thank you both for looking into this!