modkit icon indicating copy to clipboard operation
modkit copied to clipboard

Coverage of modekit pileup results does not match the coverage displayed on IGV

Open rania-o opened this issue 10 months ago • 2 comments

Hello,

I used modkit pileup with a bam basecalled with dorado model [email protected] and called for m5C, m6a, pseudou and inosine. I have two questions :

  • The first one is why does modkit report positions as modified when the fraction of modified base is 0 ?
  • The second is about the valid coverage or the score, for one position on IGV I have more than 7500 reads, but in the results modkit reports only 39 for the coverage, why this big difference ? I also have the case where modkit report ~2900 while on IGV I have only 800 reads.

Thanks in advance for you reply, Rania

rania-o avatar Feb 25 '25 14:02 rania-o

Hello @rania-o,

The first one is why does modkit report positions as modified when the fraction of modified base is 0 ?

Modkit pileup will report all reference positions where at least one read had a valid call. A "valid call" can be either a canonical or modified call with a probability $\ge$ the pass threshold. You may filter out the positions that have 0% modified - but that's not the default behavior.

The second is about the valid coverage or the score, for one position on IGV I have more than 7500 reads, but in the results modkit reports only 39 for the coverage, why this big difference ? I also have the case where modkit report ~2900 while on IGV I have only 800 reads.

For high-depth, you may want to increase the --max-depth parameter in modkit pileup. If you have a position with valid coverage of 39, but a read depth of 7500, could you check the value of $\text{N}_{\text{diff}}$? You may have a case where most of your reads align an alternate base, for example a G, then a small proportion of the reads have a mismatch and a modification reported on that mismatch. When modkit reports more depth than IGV, it can often be the case that IGV is downsampling the modBAM, if I recall correctly you can change this in the preferences.

ArtRand avatar Feb 25 '25 15:02 ArtRand

Hi @ArtRand ,

Thanks for your reply.

Modkit pileup will report all reference positions where at least one read had a valid call. A "valid call" can be either a canonical or modified call with a probability ≥ the pass threshold. You may filter out the positions that have 0% modified - but that's not the default behavior.

What do you mean here by "that's not the default behavior" ? Filtering out positions with 0% modified automatically by modkit ?

For high-depth, you may want to increase the --max-depth parameter in modkit pileup. If you have a position with valid coverage of 39, but a read depth of 7500, could you check the value of N diff ? You may have a case where most of your reads align an alternate base, for example a G, then a small proportion of the reads have a mismatch and a modification reported on that mismatch. When modkit reports more depth than IGV, it can often be the case that IGV is downsampling the modBAM, if I recall correctly you can change this in the preferences.

For the case where IGV shows less reads than modkit, actually I'm aware of this and I checked on IGV all the reads are displayed.

For the other case, you are right. All the reads are considered as an alternate base. But the problem that I face is that I'm doubting about the modification to be a pseudoU since the signature of this modification is misbasecalling uridine as a cytosine T-->C, which is the case on IGV. But, I'm a bit surprised by the model not calling this base as a pseudoU, and instead calling an m6a even if the base in the reference is not an A.

Image This is the output for this position :

EU293121.1      2131    2132    a       19      +       2131    2132    255,0,0 19      26.32   5       12      2       345     1192    6575    0
EU293121.1      2131    2132    m       6247    +       2131    2132    255,0,0 6247    2.37    148     6099    0       345     1192    347     0
EU293121.1      2131    2132    17596   19      +       2131    2132    255,0,0 19      10.53   2       12      5       345     1192    6575    0
EU293121.1      2131    2132    17802   321     +       2131    2132    255,0,0 321     13.40   43      278     0       345     1192    6273    0

rania-o avatar Feb 26 '25 10:02 rania-o