modkit pileup produces non-deterministic results depending on thread count
Hello,
I ran multiple replicates of the modkit pileup step using the same input BAM file and same reference genome, changing only the number of threads:
Data: Same sample, same basecalling and alignment, same input file. Tool versions: All runs used the same version of modkit. Command: modkit pileup --threads [4 or 8]
Each configuration was repeated 5 times, and the results were consistent within each group but diverged between them:
Result : bedMethyl lines (consistent across 5 replicates) 4 Threads : 14,401 8 Threads : 14,289 Diff = -112 lines
The threads count affects downstream analyses relying on consistent and complete CpG methylation calls.
Could the development team:
- Confirm whether this issue is known?
- Indicate if there's a recommended thread count to ensure determinism?
- Suggest a workaround or flag this for future patching?
Many thanks in advance. I remain at your disposal. Best. Boris
Hello @blipinskiaima,
Would you be able to share the data with me that causes this problem? Obviously, this shouldn't happen - but I've never been able to produce it. Could you send me email at art.rand[at]nanoporetech.com? If you can't share the data maybe we can debug it via sharing the logs at least. Thanks.
Hello @ArtRand,
Many thanks for your feedback, I've just sent you an email with all the datails of this issus.
We stay in touch, Best Boris