Seeing modified bases in an in-vitro transcribed RNA
Hi,
Hope you are well. I have an invitro transcribed RNA without any RNA modification writers in the reaction. To my surprise, I see a lot of RNA modification calls with probability of modification calls above 90 percent. For example, I have attached an IGV snapshot of m5C calls with threshold visibility set at 0.6 in the IGV. Kindly help me understand this. Thanks
Hello @kamaloxfordpathology,
Could you run modkit summary $bam with the automatic threshold detection and the --only-mapped flag and let me know what you get? You may also want to try --mod-threshold m:0.9. Feel free to move this discussion over to the Modkit repo by the way.
We've recently published a blog post that describes the performance of the RNA base modification models on synthetic RNA strands with base modifications at known positions as well as the data and methods. In figure 4a of that article, there is confusion matrix showing that the estimated false positive rate of the m5C model is 0.92%.
It's hard to "eyeball" what the FPR would be in your screen shot, but the output from Modkit should make it easier to diagnose what's going on.
Hi @ArtRand ,
Thanks for looking into this issue. I ran the modkit summary as suggested. This was the output.
bases A,C,T
total_reads_used 137807
count_reads_T 137807
count_reads_C 137807
count_reads_A 137807
pass_threshold_T 0.6894531
pass_threshold_A 0.6171875
pass_threshold_C 0.6074219
base code pass_count pass_frac all_count all_frac
T - 1524140 0.97000086 1636993 0.93775237
T 17802 47137 0.029999167 108663 0.06224766
A - 1044031 0.9339467 1119095 0.901614
A a 57378 0.051327974 92544 0.07455932
A 17596 16461 0.014725326 29574 0.023826692
C - 1489902 0.90810806 1589027 0.8747711
C m 150764 0.09189195 227479 0.12522887
Hello @kamaloxfordpathology, thanks let me circulate this with my team. Could you run the same command with --mod-threshold m:0.9 at your convenience? Also could you tell me if you're using HAC or SUP basecalling/modification-detection?
Hi @ArtRand ,
Thanks for the reply. I have used HAC models. I ran the modkit summary command with threshold for all modifications set at 0.7, 0.8 ,0.9 and restricted it to all reads present in the region shown in the IGV snapshot. Here are the outputs. Kindly ignore the previous modkit summary output since I did not restrict it to the in-vitro transcribed region. Thanks!
0.7 filter threshold
bases C,T,A
total_reads_used 13586
count_reads_A 13586
count_reads_T 13586
count_reads_C 13586
base code pass_count pass_frac all_count all_frac
A - 424458 0.94891024 476234 0.9002329
A a 17787 0.03976428 41277 0.078026585
A 17596 5066 0.011325453 11501 0.021740528
C - 408812 0.9693759 445183 0.9339771
C m 12915 0.030624077 31470 0.06602287
T - 434138 0.97291493 458925 0.9473896
T 17802 12086 0.027085051 25485 0.052610394
0.8 filter threshold
bases T,C,A
total_reads_used 13586
count_reads_T 13586
count_reads_A 13586
count_reads_C 13586
base code pass_count pass_frac all_count all_frac
C - 375306 0.98175424 445183 0.9339771
C m 6975 0.01824574 31470 0.06602287
T - 411891 0.98221064 458925 0.9473896
T 17802 7460 0.017789394 25485 0.052610394
A - 388420 0.9662838 476234 0.9002329
A a 10293 0.025606198 41277 0.078026585
A 17596 3260 0.008109997 11501 0.021740528
0.9 filter threshold
bases A,C,T
total_reads_used 13586
count_reads_T 13586
count_reads_C 13586
count_reads_A 13586
base code pass_count pass_frac all_count all_frac
T - 371637 0.9901897 458925 0.9473896
T 17802 3682 0.009810321 25485 0.052610394
C - 319230 0.9910651 445183 0.9339771
C m 2878 0.008934892 31470 0.06602287
A - 325851 0.9814198 476234 0.9002329
A a 4515 0.013598578 41277 0.078026585
A 17596 1654 0.0049816277 11501 0.021740528
Hi,
Hope you are well. Is there any update to this problem. I see the presence of modified residues in all our in-vitro transcribed replicates. Hence I do not think it is an issue with the experiment.
Hello,
Any news about this thread? I am experiencing the same issue when analysing RCS (RNA004 control strand).
If I am not mistaken, there should be no modifications detected since the strand is produced using an in vitro transcription system, correct?
However, I still observe a noticeable fraction of bases being flagged as modified. Could this be related to model calibration, default thresholds, or something else?
Thanks in advance for your help!