dorado icon indicating copy to clipboard operation
dorado copied to clipboard

Recomendations for binary DNA m6A calls

Open mrvollger opened this issue 1 year ago • 5 comments

Hello!

Thanks for taking the time to read. I am the developer of fibertools, our long-read framework for studying chromatin accessibility using Fiber-seq (DNA treaded with m6A).

Many of our analyses require a simplifying assumption of a binary m6A call (without an ML score). Do you have any precision and recall measurements as a function of the ML score that could be shared for DNA m6A?

We are working on building our Fiber-seq functionality for ONT and this data will be really useful for making decisions within fibertools!

I think this use of nanopore is on your radar as well (https://youtu.be/-pwXXzu17JQ?si=j045h7nZyddZVksV&t=974), and we'd like to make all the work we have done in this space compatible with nanopore.

Thanks in advance! Mitchell

mrvollger avatar Jul 12 '24 16:07 mrvollger

I'll check with the mods team and get back to you. 👍

HalfPhoton avatar Jul 15 '24 08:07 HalfPhoton

Hello @mrvollger,

We're working on an article and data release that will demonstrate how we validate the base modification models. Probably the best numbers I can offer are the accuracies reported in the most recent LC update (97.5% accuracy for the 6mA model). These numbers are calculated by calling the most likely modification state at individual A residues on single molecule reads after removing the 10% lowest confidence calls. If you need to determine the probability values, use modkit specifically the sample-probs or call-mods sub-commands. For the details on how the filtering is performed I can refer you to the modkit documentation.

ArtRand avatar Jul 15 '24 22:07 ArtRand

Hi @ArtRand,

Thanks for the info. Are the filters and distributions built on just the ML tag? Or is there a more complex internal probability estimate within modkit?

Assuming it is just the ML tag I have a similar filter in fibertools that I can apply on the fly. It was more a question about what threshold I should use when applying this filter? Hence wanting to see some precision/recall estimates based on these thresholds.

Would you be able to share the filter used for the ZNF locus presented at LC that I linked in my original comment? That would at least give me a hint if my threshold was similar to yours in the same application. Or was this also just dropping the lowest 10%?

Cheers, Mitchell

mrvollger avatar Jul 15 '24 22:07 mrvollger

PS I am also using rust and mdBook for fibertools!

mrvollger avatar Jul 15 '24 22:07 mrvollger

Hello @mrvollger,

Thanks for the info. Are the filters and distributions built on just the ML tag? Or is there a more complex internal probability estimate within modkit?

The filter is very simple, find the 10-th percentile probability value and remove any call (canonical or modified) with a probability below that number. Importantly, however, if a call falls below the threshold (i.e. fails) it is not considered canonical by default (although you can force this kind of behavior).

Would you be able to share the filter used for the ZNF locus presented at LC that I linked in my original comment? That would at least give me a hint if my threshold was similar to yours in the same application. Or was this also just dropping the lowest 10%?

If I recall correctly all of the plots in that slide were generated by just dropping the lowest 10%. I'm sure you can do something similar in fiber-tools or use the mod_kit crate directly. The API is not camera-ready yet though. I'm working towards a library release. In general, we advise people to calculate the thresholds per-run instead of using a hard-coded threshold value.

ArtRand avatar Jul 16 '24 00:07 ArtRand