assess the precision of the 4mC ratio
Hello, I am working on quantifying the ratio of 4mC in mouse samples, but I have encountered a challenge. According to public papers, 4mC is very rare in mammals. I was wondering if you could provide some guidance on how I can assess the precision of the 4mC ratio of the modkit? Additionally, do you have any strategies to improve its precision, such as setting a higher threshold for the analysis? Thank you very much !!!
bases C total_reads_used 10042 count_reads_C 10042 @ pass_threshold_C 0.640625 base code pass_count pass_frac all_count all_frac C - 33096024 0.9303225 35700393 0.905164 C m 1632598 0.045892 2118287 0.053708013 C 21839 846164 0.0237855 1622119 0.041127943
Hello @hannan666666,
We recommend testing base modification models on synthetic strands. We've recently published a blog post describing how we derive the model performance metrics. Unfortunately, the 4mC validation data hasn't been released publicly yet.
I ran a test on the validation data I have, using the latest models ([email protected]_4mC_5mC@v3) and attached the pass confusion matrix from modkit validate.
> Call probability threshold: 0.6836
> Percent of modified base calls removed: 9.98%
> Filtered accuracy: 96.85%
> Filtered modified base calls contingency table
Called Base
┌───────┬────────┬────────┬────────┐
│ │ C │ 21839 │ m │
├───────┼────────┼────────┼────────┤
Ground │ C │ 97.83% │ 1.75% │ 0.42% │
Truth │ 21839 │ 1.10% │ 98.78% │ 0.12% │
│ m │ 0.45% │ 0.02% │ 99.52% │
└───────┴────────┴────────┴────────┘
The threshold value I'm getting isn't much higher than what you're getting. There will always be a trade-off between increasing the --filter-threshold and the sensitivity of the model. What I would do is look at the output from modkit sample-probs and pick a threshold value for 4mC that corresponds to ~15-20th percentile.
Thank you very much for your kind and informative reply! If possible, could you share the species and the 4mC fraction of your validation sample? My sample is from a mouse, and the 4mC fraction I observed is 0.041127943. Based on your experience, do you think this value is unusually high for mammals? I would greatly appreciate any insights you could provide.
Thank you again for your time and support!