modkit icon indicating copy to clipboard operation
modkit copied to clipboard

How to calculate filter thresholds from probabilities.tsv

Open KunFang93 opened this issue 6 months ago • 6 comments

Hi @ArtRand ,

Thanks for your continuous help! I have a naive question about how to calculate filter threshold from probabilities.tsv (generated from sample-probs). In documents, it says Filtering in modkit is performed by first determining the value of q for the lowest n-th percentile of calls (10th percentile by default)..

When I use modkit pileup, I noticed the estimated cut off is 0.994. I would like to ask if this proba could be calculated from probabilities.tsv and how?

code    primary_base    range_start     range_end       count   frac    percentile_rank
-       A       0.5     0.50390625      28      0.00000010300491        0.000005150245
-       A       0.50390625      0.5078125       430     0.000001581861  0.00008939354
-       A       0.5078125       0.51171875      17      0.000000062538696       0.00017161353
-       A       0.51171875      0.515625        94176   0.00034644964   0.017497223
-       A       0.515625        0.51953125      29      0.00000010668365        0.03482504
-       A       0.51953125      0.5234375       495     0.0000018209796 0.034921423
-       A       0.5234375       0.52734375      34      0.00000012507739        0.035018723
-       A       0.52734375      0.53125 149473  0.0005498733    0.06251864
-       A       0.53125 0.53515625      28      0.00000010300491        0.09001746
-       A       0.53515625      0.5390625       544     0.0000020012383 0.09012267
-       A       0.5390625       0.54296875      31      0.00000011404114        0.09022843
-       A       0.54296875      0.546875        108932  0.00040073323   0.11027079
-       A       0.546875        0.55078125      32      0.00000011771989        0.13031334
-       A       0.55078125      0.5546875       519     0.0000019092695 0.1304147
-       A       0.5546875       0.55859375      37      0.00000013611363        0.13051696
-       A       0.55859375      0.5625  164917  0.00060668786   0.16085817
-       A       0.5625  0.56640625      24      0.00000008828992        0.19119698
-       A       0.56640625      0.5703125       564     0.0000020748132 0.19130513
-       A       0.5703125       0.57421875      28      0.00000010300491        0.19141401
-       A       0.57421875      0.578125        124975  0.00045975138   0.21440673
-       A       0.578125        0.58203125      41      0.00000015082861        0.23740186
-       A       0.58203125      0.5859375       587     0.0000021594242 0.23751736
-       A       0.5859375       0.58984375      45      0.0000001655436 0.23763362
-       A       0.58984375      0.59375 182401  0.0006710071    0.27119225
-       A       0.59375 0.59765625      44      0.00000016186485        0.30475068
-       A       0.59765625      0.6015625       625     0.0000022992167 0.30487373
-       A       0.6015625       0.60546875      41      0.00000015082861        0.30499625
-       A       0.60546875      0.609375        142829  0.0005254317    0.33127537
-       A       0.609375        0.61328125      55      0.00000020233107        0.35755706
-       A       0.61328125      0.6171875       653     0.0000024022215 0.3576873
-       A       0.6171875       0.62109375      50      0.00000018393733        0.3578166
-       A       0.62109375      0.625   199665  0.00073451694   0.39455163
-       A       0.625   0.62890625      54      0.00000019865232        0.43128744
-       A       0.62890625      0.6328125       659     0.000002424294  0.43141857
-       A       0.6328125       0.63671875      40      0.00000014714986        0.43154714
-       A       0.63671875      0.640625        162500  0.00059779634   0.46144432
-       A       0.640625        0.64453125      697     0.0000025640863 0.49146232
-       A       0.64453125      0.6484375       53      0.00000019497357        0.4916003
-       A       0.6484375       0.65234375      219464  0.0008073525    0.53197765
-       A       0.65234375      0.65625 57      0.00000020968857        0.5723558
-       A       0.65625 0.66015625      675     0.000002483154  0.5724904
-       A       0.66015625      0.6640625       58      0.0000002133673 0.5726252
-       A       0.6640625       0.66796875      183598  0.00067541056   0.60640645
-       A       0.66796875      0.671875        77      0.0000002832635 0.64019114
-       A       0.671875        0.67578125      822     0.0000030239298 0.6403565
-       A       0.67578125      0.6796875       240258  0.0008838483    0.68470013
-       A       0.6796875       0.68359375      58      0.0000002133673 0.7289032
-       A       0.68359375      0.6875  857     0.0000031526858 0.7290715
-       A       0.6875  0.69140625      75      0.000000275906  0.7292429
-       A       0.69140625      0.6953125       205152  0.00075470225   0.76699185
-       A       0.6953125       0.69921875      930     0.0000034212344 0.80489796
-       A       0.69921875      0.703125        80      0.00000029429972        0.80508375
-       A       0.703125        0.70703125      262574  0.0009659432    0.85339564
-       A       0.70703125      0.7109375       79      0.000000290621  0.90170735
-       A       0.7109375       0.71484375      999     0.000003675068  0.90190566
-       A       0.71484375      0.71875 228268  0.00083974015   0.94407636
-       A       0.71875 0.72265625      78      0.00000028694222        0.9860777
-       A       0.72265625      0.7265625       1072    0.0000039436163 0.98628926
-       A       0.7265625       0.73046875      86      0.00000031637222        0.98650223
-       A       0.73046875      0.734375        284620  0.0010470449    1.0388703
-       A       0.734375        0.73828125      1079    0.0000039693678 1.091421
-       A       0.73828125      0.7421875       90      0.0000003310872 1.0916361
-       A       0.7421875       0.74609375      253623  0.0009330148    1.1383033
-       A       0.74609375      0.75    1176    0.000004326206  1.1851704
-       A       0.75    0.75390625      87      0.00000032005096        1.1854028
-       A       0.75390625      0.7578125       309196  0.0011374537    1.2422913
-       A       0.7578125       0.76171875      1315    0.000004837552  1.2994059
-       A       0.76171875      0.765625        81      0.0000002979785 1.2996628
-       A       0.765625        0.76953125      280681  0.0010325543    1.3513054
-       A       0.76953125      0.7734375       1410    0.000005187033  1.4031924
-       A       0.7734375       0.77734375      333861  0.00122819      1.4648613
-       A       0.77734375      0.78125 89      0.00000032740846        1.5262872
-       A       0.78125 0.78515625      1518    0.0000055843375 1.5265827
-       A       0.78515625      0.7890625       308465  0.0011347646    1.5836003
-       A       0.7890625       0.79296875      141     0.0000005187033 1.6403643
-       A       0.79296875      0.796875        1739    0.0000063973403 1.6407102
-       A       0.796875        0.80078125      359781  0.0013235431    1.7072072
-       A       0.80078125      0.8046875       1699    0.0000062501904 1.7736968
-       A       0.8046875       0.80859375      336307  0.0012371882    1.8358687
-       A       0.80859375      0.8125  158     0.000000581242  1.8977573
-       A       0.8125  0.81640625      2040    0.0000075046432 1.8981615
-       A       0.81640625      0.8203125       390495  0.0014365321    1.9703634
-       A       0.8203125       0.82421875      2067    0.000007603969  2.04257
-       A       0.82421875      0.828125        368439  0.0013553938    2.1107202
-       A       0.828125        0.83203125      2193    0.000008067492  2.178893
-       A       0.83203125      0.8359375       418594  0.0015399012    2.2562916
-       A       0.8359375       0.83984375      2419    0.000008898888  2.3337317
-       A       0.83984375      0.84375 169     0.0000006217082 2.3342078
-       A       0.84375 0.84765625      399317  0.001468986     2.4076881
-       A       0.84765625      0.8515625       2446    0.000008998214  2.4815872
-       A       0.8515625       0.85546875      452082  0.0016630952    2.565192
-       A       0.85546875      0.859375        2666    0.000009807539  2.6488369
-       A       0.859375        0.86328125      434097  0.0015969329    2.7291741
-       A       0.86328125      0.8671875       487538  0.0017935288    2.8986971
-       A       0.8671875       0.87109375      2927    0.0000107676915 2.9889119
-       A       0.87109375      0.875   471184  0.0017333665    3.0761187
-       A       0.875   0.87890625      3097    0.000011393078  3.1633563
-       A       0.87890625      0.8828125       521776  0.0019194817    3.2599
-       A       0.8828125       0.88671875      3263    0.00001200375   3.3564746
-       A       0.88671875      0.890625        515147  0.0018950953    3.4518297
-       A       0.890625        0.89453125      562330  0.0020686695    3.6500177
-       A       0.89453125      0.8984375       3615    0.000013298669  3.7541163
-       A       0.8984375       0.90234375      560013  0.0020601458    3.8577886
-       A       0.90234375      0.90625 605824  0.002228673     4.0722294
-       A       0.90625 0.91015625      607514  0.0022348901    4.295408
-       A       0.91015625      0.9140625       4079    0.000015005608  4.4079022
-       A       0.9140625       0.91796875      658117  0.0024210457    4.529705
-       A       0.91796875      0.921875        657344  0.002418202     4.7716675
-       A       0.921875        0.92578125      712816  0.0026222696    5.023691
- -       A       0.92578125      0.9296875       718997  0.0026450078    5.287055
-       A       0.9296875       0.93359375      771031  0.0028364277    5.5611267
-       A       0.93359375      0.9375  782581  0.0028789171    5.8468943
-       A       0.9375  0.94140625      834767  0.0030708963    6.144384
-       A       0.94140625      0.9453125       856070  0.0031492647    6.4553924
-       A       0.9453125       0.94921875      906544  0.0033349458    6.779603
-       A       0.94921875      0.953125        1903403 0.0070021376    7.2964563
-       A       0.99609375      1       251045924       0.92353433      53.82328
a       A       0.5     0.50390625      135533  0.074519046     3.7259524
a       A       0.50390625      0.5078125       20      0.000010996443  7.4524546
a       A       0.5078125       0.51171875      484     0.0002661139    7.4663095
a       A       0.51171875      0.515625        14      0.00000769751   7.48
a       A       0.515625        0.51953125      80762   0.044404734     9.700622
a       A       0.51953125      0.5234375       19      0.000010446621  11.921381
a       A       0.5234375       0.52734375      451     0.0002479698    11.934302
a       A       0.52734375      0.53125 20      0.000010996443  11.94725
a       A       0.53125 0.53515625      121830  0.06698483      15.297042
a       A       0.53515625      0.5390625       17      0.000009346976  18.64675
a       A       0.5390625       0.54296875      423     0.00023257476   18.658846
a       A       0.54296875      0.546875        17      0.000009346976  18.670942
a       A       0.546875        0.55078125      69130   0.038009204     20.571869
a       A       0.55078125      0.5546875       21      0.000011546264  22.472908
a       A       0.5546875       0.55859375      415     0.00022817618   22.484894
a       A       0.55859375      0.5625  110478  0.06074325      25.533464
a       A       0.5625  0.56640625      14      0.00000769751   28.571012
a       A       0.56640625      0.5703125       403     0.00022157832   28.582474
a       A       0.5703125       0.57421875      16      0.000008797154  28.593996
a       A       0.57421875      0.578125        57939   0.031856146     30.187243
a       A       0.578125        0.58203125      11      0.0000060480434 31.78035
a       A       0.58203125      0.5859375       397     0.00021827938   31.791569
a       A       0.5859375       0.58984375      8       0.000004398577  31.802702
a       A       0.58984375      0.59375 100221  0.055103723     34.55811
a       A       0.59375 0.59765625      8       0.000004398577  37.313515
a       A       0.59765625      0.6015625       363     0.00019958544   37.323715
a       A       0.6015625       0.60546875      15      0.000008247332  37.334106
a       A       0.60546875      0.609375        48833   0.026849464     38.67699
a       A       0.609375        0.61328125      17      0.000009346976  40.01993
a       A       0.61328125      0.6171875       328     0.00018034165   40.029415
a       A       0.6171875       0.62109375      4       0.0000021992885 40.038544
a       A       0.62109375      0.625   91300   0.05019876      42.54859
a       A       0.625   0.62890625      8       0.000004398577  45.058746
a       A       0.62890625      0.6328125       309     0.00016989504   45.067463
a       A       0.6328125       0.63671875      6       0.0000032989328 45.076122
a       A       0.63671875      0.640625        41197   0.022651022     46.20884
a       A       0.640625        0.64453125      303     0.0001665961    47.34972
a       A       0.64453125      0.6484375       8       0.000004398577  47.35827
a       A       0.6484375       0.65234375      81970   0.04506892      49.611935
a       A       0.65234375      0.65625 4       0.0000021992885 51.865494
a       A       0.65625 0.66015625      291     0.00015999824   51.8736
a       A       0.66015625      0.6640625       9       0.0000049483992 51.88185
a       A       0.6640625       0.66796875      33434   0.018382752     52.801235
a       A       0.66796875      0.671875        6       0.0000032989328 53.720535
a       A       0.671875        0.67578125      244     0.0001341566    53.727406
a       A       0.67578125      0.6796875       74149   0.04076876      55.772556
a       A       0.6796875       0.68359375      6       0.0000032989328 57.811157
a       A       0.68359375      0.6875  261     0.00014350358   57.818497
a       A       0.6875  0.69140625      2       0.0000010996442 57.825726
a       A       0.69140625      0.6953125       27288   0.015003546     58.57596
a       A       0.69921875      0.703125        226     0.00012425981   59.33235
a       A       0.703125        0.70703125      66609   0.0366231       61.16972
a       A       0.70703125      0.7109375       9       0.0000049483992 63.00112
a       A       0.7109375       0.71484375      207     0.000113813185  63.00706
a       A       0.71484375      0.71875 2       0.0000010996442 63.012802
a       A       0.71875 0.72265625      22132   0.012168664     63.621296
a       A       0.72265625      0.7265625       183     0.00010061745   64.23476
a       A       0.7265625       0.73046875      3       0.0000016494664 64.23987
a       A       0.73046875      0.734375        60349   0.033181217     65.89902
a       A       0.734375        0.73828125      192     0.00010556585   67.563354
a       A       0.73828125      0.7421875       2       0.0000010996442 67.56868
a       A       0.7421875       0.74609375      17848   0.009813226     68.0594
a       A       0.74609375      0.75    188     0.00010336656   68.55523
a       A       0.75    0.75390625      2       0.0000010996442 68.560455
a       A       0.75390625      0.7578125       54182   0.029790463     70.05003
a       A       0.7578125       0.76171875      151     0.000083023144  71.54371
a       A       0.76171875      0.765625        2       0.0000010996442 71.54791
a       A       0.765625        0.76953125      14025   0.0077112555    71.93353
a       A       0.76953125      0.7734375       134     0.000073676165  72.32278
a       A       0.7734375       0.77734375      2       0.0000010996442 72.326515
a       A       0.77734375      0.78125 48652   0.026749946     73.66407
a       A       0.78125 0.78515625      128     0.00007037723   75.00508
a       A       0.78515625      0.7890625       11192   0.006153609     75.316284
a       A       0.79296875      0.796875        121     0.000066528475  75.62729
a       A       0.796875        0.80078125      44213   0.024309287     76.846085
a       A       0.80078125      0.8046875       106     0.000058281144  78.06446
a       A       0.8046875       0.80859375      8640    0.004750463     78.3049
a       A       0.8125  0.81640625      109     0.00005993061   78.54542
a       A       0.81640625      0.8203125       39701   0.02182849      79.63984
a       A       0.8203125       0.82421875      84      0.000046185058  80.733574
a       A       0.82421875      0.828125        6686    0.0036761109    80.919685
a       A       0.828125        0.83203125      70      0.00003848755   81.105415
a       A       0.83203125      0.8359375       35676   0.019615455     82.08811
a       A       0.83984375      0.84375 63      0.000034638793  83.07062
a       A       0.84375 0.84765625      5092    0.0027996942    83.21233
a       A       0.84765625      0.8515625       58      0.000031889685  83.35391
a       A       0.8515625       0.85546875      31631   0.017391425     84.22508
a       A       0.85546875      0.859375        63      0.000034638793  85.09638
a       A       0.859375        0.86328125      3943    0.0021679488    85.20651
a       A       0.86328125      0.8671875       44      0.000024192173  85.316124
a       A       0.8671875       0.87109375      28243   0.015528627     86.09376
a       A       0.87109375      0.875   3023    0.0016621123    86.9533
a       A       0.875   0.87890625      47      0.00002584164   87.0377
a       A       0.87890625      0.8828125       25426   0.013979778     87.737976
a       A       0.8828125       0.88671875      30      0.000016494663  88.43779
a       A       0.88671875      0.890625        2193    0.0012057599    88.4989
a       A       0.890625        0.89453125      22414   0.012323713     89.17538
a       A       0.89453125      0.8984375       21      0.000011546264  89.79214
a       A       0.8984375       0.90234375      1674    0.00092040224   89.83874
a       A       0.90234375      0.90625 20265   0.011142146     90.441864
a       A       0.90625 0.91015625      1281    0.0007043221    91.03419
a       A       0.91015625      0.9140625       8       0.000004398577  91.069626
a       A       0.9140625       0.91796875      17799   0.0097862845    91.55916
a       A       0.91796875      0.921875        932     0.0005124342    92.0741
a       A       0.921875        0.92578125      15917   0.008751519     92.53729
a       A       0.92578125      0.9296875       744     0.00040906767   92.99532
a       A       0.9296875       0.93359375      13912   0.0076491255    93.39823
a       A       0.93359375      0.9375  531     0.00029195554   93.79529
a       A       0.9375  0.94140625      12339   0.006784255     94.14909
a       A       0.94140625      0.9453125       394     0.00021662992   94.49914
a       A       0.9453125       0.94921875      11051   0.0060760844    94.813774
a       A       0.94921875      0.953125        10258   0.0056400755    95.39958
a       A       0.953125        0.95703125      245     0.00013470642   95.688324
a       A       0.95703125      0.9609375       8791    0.0048334864    95.93674
a       A       0.9609375       0.96484375      7658    0.0042105378    96.38894
a       A       0.96484375      0.96875 6805    0.0037415395    96.78654
a       A       0.96875 0.97265625      6316    0.0034726765    97.14725
a       A       0.97265625      0.9765625       5580    0.0030680075    97.47428
a       A       0.9765625       0.98046875      9133    0.0050215255    97.87876
a       A       0.98046875      0.984375        7268    0.0039961073    98.329636
a       A       0.984375        0.98828125      5844    0.0032131604    98.6901
a       A       0.98828125      0.9921875       6466    0.00355515      99.02852
a       A       0.9921875       0.99609375      6768    0.0037211962    99.392334
a       A       0.99609375      1       7668    0.0042160363    99.7892

Besides, when I use modkit sample-probs ../../filt_bam/200nM_pA-Hia5.srt.filt.l1kb.noMP.bam -f 0.1 -p 0.1,0.2,0.3,0.4,0.5 -t 40, the cutoff around to 1. I wondered if the is the dtype issue?

> in the next version of modkit this command will be `modkit modbam sample-probs`
> sampling 10% of reads
 base  percentile  threshold
 C     10          0.9296875
 C     20          0.95703125
 C     30.000002   0.96875
 C     40          0.97265625
 C     50          0.98046875
 A     10          1
 A     20          1
 A     30.000002   1
 A     40          1
 A     50          1

Thanks again!

Best, Kun

KunFang93 avatar Jun 12 '25 13:06 KunFang93

Hello @KunFang93,

Could you tell me which version of Modkit and Dorado you've basecalled/modcalled your data with? If you're using Dorado v1.0.0 you must use Modkit v0.5.0.

To your questions.

When I use modkit pileup, I noticed the estimated cut off is 0.994. I would like to ask if this proba could be calculated from probabilities.tsv and how?

You cannot easily calculate the estimated pass threshold from probabilities.tsv since these probabilities are grouped by modification state and the automatic threshold detection is performed on all modification calls combined together. If you want to find the pass threshold value that would filter out the lowest 10% of calls per modification state find the row where percentile_rank is close to 10. Based on this table the threshold value for canonical Adenine calls is somewhere between 0.94921875 and 0.99609375. This happens because the default behavior in Dorado is to clamp canonical probabilities >=95% to 100%. Using 0.94921875 is probably fine.

For 6mA calls the value would be somewhere between 0.515625 and 0.51953125 both of which are quite low. This makes me think there is very little 6mA in this sample and what you have here is the false-positive distribution.

Besides, when I use modkit sample-probs ../../filt_bam/200nM_pA-Hia5.srt.filt.l1kb.noMP.bam -f 0.1 -p 0.1,0.2,0.3,0.4,0.5 -t 40, the cutoff around to 1. I wondered if the is the dtype issue?

What's likely happening here is that all of the reads the sampling algo is getting contain entirely implicitly canonical calls where the probability is set to 100% (or 1.0). The sampling algo is slightly different when you use -f 0.1 than the default.

I think what you have here is a very rare 6mA (or m6A) modification that is likely going to be called correctly when you see it, but it will be buried in the other calls when estimating the threshold values.

One thing I'm working on right now is a new approach where pileup will drop the lowest 10% of calls per position. I think this will end up working better for your use case (unless I'm wildly off in guessing what that is).

I don't want to say "wait for a new feature before continuing your research" so I would generally recommend using an option like --filter-threshold A:0.8 or --mod-threshold a:0.85 and see if you can find positions with high percentages of a-modification.

Happy to help more.

ArtRand avatar Jun 13 '25 00:06 ArtRand

Hi @ArtRand

Thanks so much for your reply! Yes, I am using dorado v1.0.0 and modkit v0.5.0.

For 6mA calls the value would be somewhere between 0.515625 and 0.51953125 both of which are quite low. This makes me think there is very little 6mA in this sample and what you have here is the false-positive distribution.

You’re absolutely right—I mistakenly pasted the probabilities.tsv file from the Genomic DNA sample, so a cutoff around 0.51–0.52 is expected there. However, when I check the 200nM free Hia5 sample, the 10th percentile value is only around 0.55–0.56, which still seems quite low. I would have expected a much higher modification signal given the nature of this sample.

One thing I'm working on right now is a new approach where pileup will drop the lowest 10% of calls per position. I think this will end up working better for your use case (unless I'm wildly off in guessing what that is).

That sounds like a fantastic improvement! I imagine it would work especially well with good coverage to ensure statistical power. For lower coverage data, maybe aggregating over a small window (e.g., 10 bp) could also be effective? Just a thought—I’m really excited to try this approach once it’s available.

I don't want to say "wait for a new feature before continuing your research" so I would generally recommend using an option like --filter-threshold A:0.8 or --mod-threshold a:0.85 and see if you can find positions with high percentages of a-modification.

Thanks for this suggestion! I’m currently trying to balance retaining as many 6mA calls as possible in the 200nM sample while minimizing false positives in the genomic DNA control. If I use a relatively loose threshold, I worry about increasing background noise. On the other hand, stricter thresholds reduce sensitivity in the Hia5 sample. I’m still experimenting to find the optimal cutoff. Please let me know if you have any suggestion.

Thanks again!

Best, Kun

KunFang93 avatar Jun 13 '25 02:06 KunFang93

Hello @KunFang93,

However, when I check the 200nM free Hia5 sample, the 10th percentile value is only around 0.55–0.56, which still seems quite low. I would have expected a much higher modification signal given the nature of this sample.

I don't know what kind of experiment you're doing, but what I would suggest (and what I did for the chromatin accessibility assay that we've developed) is find some regions where you know the 6mA should be. In the chromatin accessibility case, we use housekeeping gene promoters. Then run modkit localize on those regions. What do you think?

ArtRand avatar Jun 13 '25 21:06 ArtRand

Sorry for the delayed reply—I somehow missed your message earlier. Thank you for sharing the link; it’s very informative and helpful! I’ll proceed with the sanity check as you suggested.

I have one additional (possibly naive) question regarding the threshold settings: if the value for --mod-thresholds is lower than --filter-threshold, does that mean the mod-thresholds essentially won’t affect which modification calls are made?

For example, if I set --filter-threshold A:0.8 and --mod-threshold a:0.6, and I encounter the following:

{p_A: 0.16, p_a: 0.84},  as 0.84 >= threshold_A: 0.8, so we make the call: 6mA (m)

Since p_a = 0.84 ≥ threshold_A: 0.8, this would qualify as a 6mA call. And because the filter-threshold is more stringent than the mod-threshold, the mod call would always be made as long as it passes the filter-threshold, correct?

If this is the case, in the probabilities.tsv, if the 10% percentage of a is consistently lower than that of A (say, a < 10% and A > 10%), then we don’t need to be overly concerned about these entries—is that a fair interpretation?

Thanks again for your support and guidance!

Best, Kun

KunFang93 avatar Jul 09 '25 13:07 KunFang93

Hello @KunFang93,

I've attached a build that allows "per-position" filtering. You can also build it from this branch.

To use it, add the --position-thresholds flag to your modkit pileup command. It will use the --filter-percentile option (0.1/10% by default) to calculate a local threshold value with the probabilities at each genomic position and remove calls with probabilities below this threshold. There is an additional nuance to (help) handle positions with low coverage: --num-pseudo-probs (default 5). The way it works is it calculates 5 probabilities at equally spaced quantiles from 0.1 to 0.9, then adds these probabilities to the threshold calculation at each genomic position. I don't know if this is the best way to do it - but it seems to do what I want which is make a "weighted average" of the probabilities at each position with the global threshold. You can disable this by setting --num-pseudo-probs 0, but then positions with 1 read will not be filtered. There are also additional columns with statistics about the probabilities at each position (min, max, mean, std, median) as well as the actual threshold value used.

modkit_dev3e0663b_u16_x86_64.tar.gz

ArtRand avatar Jul 19 '25 01:07 ArtRand

Hi @ArtRand,

Thank you so much for your support and this sounds like a fantastic method! I’ll give it a try and let you know how it works with my dataset.

Best, Kun

KunFang93 avatar Jul 21 '25 14:07 KunFang93