Coverage of modekit pileup results does not match the coverage displayed on IGV
Hello,
I used modkit pileup with a bam basecalled with dorado model [email protected] and called for m5C, m6a, pseudou and inosine. I have two questions :
- The first one is why does modkit report positions as modified when the fraction of modified base is 0 ?
- The second is about the valid coverage or the score, for one position on IGV I have more than 7500 reads, but in the results modkit reports only 39 for the coverage, why this big difference ? I also have the case where modkit report ~2900 while on IGV I have only 800 reads.
Thanks in advance for you reply, Rania
Hello @rania-o,
The first one is why does modkit report positions as modified when the fraction of modified base is 0 ?
Modkit pileup will report all reference positions where at least one read had a valid call. A "valid call" can be either a canonical or modified call with a probability $\ge$ the pass threshold. You may filter out the positions that have 0% modified - but that's not the default behavior.
The second is about the valid coverage or the score, for one position on IGV I have more than 7500 reads, but in the results modkit reports only 39 for the coverage, why this big difference ? I also have the case where modkit report ~2900 while on IGV I have only 800 reads.
For high-depth, you may want to increase the --max-depth parameter in modkit pileup. If you have a position with valid coverage of 39, but a read depth of 7500, could you check the value of $\text{N}_{\text{diff}}$? You may have a case where most of your reads align an alternate base, for example a G, then a small proportion of the reads have a mismatch and a modification reported on that mismatch. When modkit reports more depth than IGV, it can often be the case that IGV is downsampling the modBAM, if I recall correctly you can change this in the preferences.
Hi @ArtRand ,
Thanks for your reply.
Modkit pileup will report all reference positions where at least one read had a valid call. A "valid call" can be either a canonical or modified call with a probability ≥ the pass threshold. You may filter out the positions that have 0% modified - but that's not the default behavior.
What do you mean here by "that's not the default behavior" ? Filtering out positions with 0% modified automatically by modkit ?
For high-depth, you may want to increase the --max-depth parameter in modkit pileup. If you have a position with valid coverage of 39, but a read depth of 7500, could you check the value of N diff ? You may have a case where most of your reads align an alternate base, for example a G, then a small proportion of the reads have a mismatch and a modification reported on that mismatch. When modkit reports more depth than IGV, it can often be the case that IGV is downsampling the modBAM, if I recall correctly you can change this in the preferences.
For the case where IGV shows less reads than modkit, actually I'm aware of this and I checked on IGV all the reads are displayed.
For the other case, you are right. All the reads are considered as an alternate base. But the problem that I face is that I'm doubting about the modification to be a pseudoU since the signature of this modification is misbasecalling uridine as a cytosine T-->C, which is the case on IGV. But, I'm a bit surprised by the model not calling this base as a pseudoU, and instead calling an m6a even if the base in the reference is not an A.
This is the output for this position :
EU293121.1 2131 2132 a 19 + 2131 2132 255,0,0 19 26.32 5 12 2 345 1192 6575 0
EU293121.1 2131 2132 m 6247 + 2131 2132 255,0,0 6247 2.37 148 6099 0 345 1192 347 0
EU293121.1 2131 2132 17596 19 + 2131 2132 255,0,0 19 10.53 2 12 5 345 1192 6575 0
EU293121.1 2131 2132 17802 321 + 2131 2132 255,0,0 321 13.40 43 278 0 345 1192 6273 0