modkit icon indicating copy to clipboard operation
modkit copied to clipboard

no m6A sites found in results

Open baibhav-bioinfo opened this issue 1 year ago • 1 comments

Hello, I am working with plant DRS dataset

I ran the command (for m6A all context) $dorado basecaller [email protected] c6_r1/pod5/ --modified-bases-models
[email protected]_m6A@v1/ --device cuda:all > c6_r1_sup_m6A.bam

and then pileup using modkit i found out that the m6A sites predicted were >1.6 million per sample, which is not desirable result.

maybe we need a stringent threshold or a site to be called methylated.

instead of running the dorado basecalling again, is there any parameter in modkit which can filter the predicted results to get a more stringent set of results?

Thanks

baibhav-bioinfo avatar Jan 04 '25 17:01 baibhav-bioinfo

Hello @baibhav-bioinfo,

If I'm understanding you correctly, the 1.6M m6A sites is too many based on your understanding of the biology - correct? You could use a higher threshold for m6A by passing --mod-threshold a:0.98 (docs describing how per-mod thresholds work)or similar to modkit pileup. You may also look at modkit sample-probs --hist to help inform the value to pick.

You may try and re-basecall the reads with [email protected]_inosine_m6A@v1 and then either --ignore 17596 or --combine-mods. The v0.5.1 models have better performance in terms of basecalling accuracy and modification calling accuracy due to adding more modified strands in the basecaller training. There is a great intro here.

ArtRand avatar Jan 06 '25 17:01 ArtRand