threshold modification in modkit pileup
Hello, Appreciation for developing such helpful program.
I am working with DRS data and using "Dorado+modkit" for m6A modification analysis. I ran Dorado for modification basecalling, then used dorado aligner for genome mapping.
Now I am using modkit pileup to convert the bam information into bed file using following command.
$modkit pileup --threads 96 a1_DRACH_.bam a1_sup_DRACH_pileup.bed --log-filepath a1_sup_DRACH_pileup.log
calculated chunk size: 144, interval size 100000, processing 14400000 positions concurrently attempting to sample 10042 reads Using filter threshold 0.9238281 for A. Done, processed 1654243 rows. Processed ~8092362 reads and skipped ~41874 reads.
The filter threshold used here automatically as ~0.92, is this threshold same as "--modified-bases-threshold 0.92" in dorado while basecalling?
Hello @baibhav-bioinfo,
No this isn't the same. If you use --modified-bases-threshold 0.92 any canonical probability >=0.92 will be elided (i.e. not explicitly recorded). If a base bas a canonical probability <0.92 it will still be recorded. Whereas the filter threshold calculated by modkit is saying that any base modification probability <0.92 will be filtered out.
If you look at the SAM specification (page 7) it has the details on the "implicitly low probability of modification" calls - it's a bit in the weeds though. In general I would recommend using the default values in dorado.