modkit icon indicating copy to clipboard operation
modkit copied to clipboard

modkit pileup produces non-deterministic results depending on thread count

Open blipinskiaima opened this issue 3 months ago • 2 comments

Hello,

I ran multiple replicates of the modkit pileup step using the same input BAM file and same reference genome, changing only the number of threads:

Data: Same sample, same basecalling and alignment, same input file. Tool versions: All runs used the same version of modkit. Command: modkit pileup --threads [4 or 8]

Each configuration was repeated 5 times, and the results were consistent within each group but diverged between them:

Result : bedMethyl lines (consistent across 5 replicates) 4 Threads : 14,401 8 Threads : 14,289 Diff = -112 lines

The threads count affects downstream analyses relying on consistent and complete CpG methylation calls.

Could the development team:

  • Confirm whether this issue is known?
  • Indicate if there's a recommended thread count to ensure determinism?
  • Suggest a workaround or flag this for future patching?

Many thanks in advance. I remain at your disposal. Best. Boris

blipinskiaima avatar Sep 30 '25 12:09 blipinskiaima

Hello @blipinskiaima,

Would you be able to share the data with me that causes this problem? Obviously, this shouldn't happen - but I've never been able to produce it. Could you send me email at art.rand[at]nanoporetech.com? If you can't share the data maybe we can debug it via sharing the logs at least. Thanks.

ArtRand avatar Oct 01 '25 03:10 ArtRand

Hello @ArtRand,

Many thanks for your feedback, I've just sent you an email with all the datails of this issus.

We stay in touch, Best Boris

blipinskiaima avatar Oct 01 '25 14:10 blipinskiaima