nonpareil icon indicating copy to clipboard operation
nonpareil copied to clipboard

Not reproducible results

Open VGalata opened this issue 2 years ago • 5 comments

Hello,

I have a question regarding the reproducibility of the results: I ran nonpareil on the same input using the same command line and got slightly different results for both runs. Is that something to be expected? Do you know what the source of this randomness is and whether the analysis could be made deterministic in the future?

Used version: nonpareil=3.3.3=r341h470a237_0 installed via conda

Thank you in advance!

Best, Valentina

VGalata avatar Aug 24 '21 10:08 VGalata

Coming into this a bit late, but there is a random seed setting as one of the parameters and sampling is mentioned in the documentation, so I think this is both completely expected and possible to make reproducible by setting -r to the same seed between runs:

-r <int> | Random generator seed. By default current time.

cjfields avatar Jan 27 '22 18:01 cjfields

Dear @cjfields,

Could you clarify how you use the -r option and with which version of the tool? Also, I do not see this option listed when running nonpareil -h - neither in the mentioned version 3.3.3 nor in the latest one (3.3.4).

I tried to run version v3.303 (3.3.3, r341h470a237_0) with the option -r set using the same command two times. The output from the two runs has different md5sums and different content as well - except for the *.npl files which don't contain any relevant output anyway.

Here are the commands I executed:

nonpareil -s some.reads.fq -T kmer -f fastq -r 23 -b test.1
nonpareil -s some.reads.fq -T kmer -f fastq -r 23 -b test.2

VGalata avatar Jan 28 '22 10:01 VGalata

Thanks for bringing this up to our attention! I have now implemented consistency with -r when using -T alignment. Note that it may still produce slightly different results with different numbers of threads (-t).

For -T kmer, we use an implementation of random_device, so it needs a little more work.

@gunturus Do you think the kmer kernel could be migrated to a deterministic implementation instead?

lmrodriguezr avatar Jan 28 '22 17:01 lmrodriguezr

Dear @cjfields,

Could you clarify how you use the -r option and with which version of the tool? Also, I do not see this option listed when running nonpareil -h - neither in the mentioned version 3.3.3 nor in the latest one (3.3.4).

I tried to run version v3.303 (3.3.3, r341h470a237_0) with the option -r set using the same command two times. The output from the two runs has different md5sums and different content as well - except for the *.npl files which don't contain any relevant output anyway.

Here are the commands I executed:

nonpareil -s some.reads.fq -T kmer -f fastq -r 23 -b test.1
nonpareil -s some.reads.fq -T kmer -f fastq -r 23 -b test.2

Happy to see @lmrodriguezr 's answer (and agree that it's good you raised it); I planned on replying that this sounds like a definite bug.

cjfields avatar Jan 28 '22 18:01 cjfields

Dear @lmrodriguezr and @cjfields,

Thank you both for looking into this!

VGalata avatar Jan 31 '22 06:01 VGalata