modkit summary
Hello, If I want to achieve higher accuracy in modkit summary, is it appropriate to set the --mod-thresholds values based on the basecalling model in dorado for each modification type? Also, are there any recommended threshold values for each modification type that could be used as a guideline?
nohup modkit summary ${dorado_d}/${pod5_f}.6mA_5mC5hmC.Aligned.sorted.bam --no-sampling --only-mapped --mod-thresholds h:0.85 --mod-thresholds m:0.8 --mod-thresholds a:0.9 -t 20 > ${modkit_d}/${pod5_f}.6mA_5mC5hmC.summary
Additionally, it would be very helpful if modkit summary could support summarizing modifications on specific sequence motifs (e.g., CG, GATC, CAC) — for example, by providing a motif list or pattern filter option. This would greatly facilitate motif-specific methylation analysis.
Thank you!
Hello @hannan666666,
In general, we recommend using the estimated threshold values as a way to get the best accuracy. If you want to tune the behavior a little bit you can use modkit sample-probs --hist docs and choose the value that discards the 10% lowest confidence calls for each modification state.
If you increase the threshold values, you're going to increase the true positive rate at the expense of the false negative rate. Depending on what you're trying to do, this might not be what you want.
If you're trying to get the most accurate counts, I would use pileup instead of summary. With the output of pileup you can count the number of reads that called a modification at a position and had the correct base aligned at that position. (I'm going to add a flag that will omit records with mismatched base modifications, e.g. the number of 6mA calls at C>A mismatches). But right now you can filter these out with --motif A 0 then require that column 4 is "a" or by valid coverage. Leading to the next question..
If you produce a bedMethyl table with pileup you can use:
$ modkit pileup ${reads.bam} - <args> | bgzip -c > ${pileup_bed_gz}
$ tabix ${pileup_bed_gz}
$ modkit motif evaluate -i ${pileup_bed_gz} \
--known-motif CG 0 m \
--known-motif GATC 1 a \
--known-motif CAC 1 a
There are more details in the motif section of the docs.
Let me know if this helps or you have additional questions.
Hello, @ArtRand
This is absolutely great — thank you for the detailed and helpful reply! Your suggestions are very valuable and exactly what I needed.
Much appreciated!
Best regards!