Difficulties to run modkit with the modbases tutorial
Hello, we are starting to apply ONT in our laboratory.
I'd like to apply the modkit to the chr20 data from the GIAB NA24385 sample. I managed to replicate the tutorial using modbam2bed (https://labs.epi2me.io/notebooks/Modified_Base_Tutorial.html), but I would like to apply modkit as indicated by nanopore.
However, I am getting the result: “processed 0 rows and skipped ~331448 reads.”
Has anyone managed to modkit this data or have any suggestions as to what might be wrong?
Please let me know if you need more information.
Thanks in advance,
Marcel
Hello @marceelrf
The data used in this tutorial is a little old, and as a result the base modification tags in the BAM don't have the MM-flag and implicitly denote the . flag. The SAM spec has details. By default, modkit pileup will not allow you to use these records unless you specify --force-allow-implicit on the command line. There is a note in the Troubleshooting section of the documentation regarding this. If you run modkit pileup with the --log-filepath <file> option, you will see lines such as
[src/read_cache.rs::376][2023-08-31 15:05:04][DEBUG] read 8272cf99-c8f3-463b-9e32-5db2efdf364d, Skipped: record 8272cf99-c8f3-463b-9e32-5db2efdf364d has un-allowed mode (ImplicitProbModified), use '--force-allow-implicit' or 'modkit update-tags --mode ambiguous'
indicating what action to take. We can work on our side on updating the tutorial.
I would suggest running modkit update-tags --mode ambiguous on the tutorial data. The ambiguous mode (?) is correct for the CpG models used in that tutorial. Also note that the handling of thresholds has changed slightly from modbam2bed to modkit, so I would expect some slight differences there too.
Thank you @ArtRand !