No results with the new RNA model
Hello,
I recently tried the new model released by dorado v1.0.0 for direct RNA, everthing works good except that I don't have any results with modkit dmr pair, which surprised me since I already had results with the same samples basecalled with dorado v0.0.9. Is modkit compatible with the new models of dorado ?
Thanks, Rania
Hello @rania-o,
Could you tell me which version of Modkit you're using with Dorado v1.0.0?
Hello @ArtRand
I'm using modkit/0.4.0, should I use another version ?
Hello @rania-o,
Yes, the modified base models in Dorado v1.0 require you to use either --modified-bases-threshold 0.0 whilst basecalling/modification calling or Modkit v0.5.0. The reason is that the false-positive rate in the latest models is quite a bit lower and the filter threshold may be estimated to be 1.0 (you could confirm this in the pileup logs). If the threshold is estimated to be 1.0, all of the modification calls will be filtered out. Modkit 0.5.0 has a shim to keep this from happening.
Hi @ArtRand
Thanks for your answer. You're right, the threshold for G was 1, but for the other bases/modifications the threshold was 0.7 or 0.8 as for the precedent analysis. Actually I just installed modkit V0.5.0 and will rerun the analysis. For modkit v0.5.0 no need to specify the modified bases threshold to 0, right ?
Actually, I just tried the new version of modkit, with the --mod-threshlod mod:0.0, and I got this warning while running modkit pileup:
Threshold of 0 for mod code a is very low. Consider increasing the filter-percentile or specifying a higher threshold.
Then I run modkit dmr pair with the three replicates and I got empty files, with these errors/ warning:
> batch failed: invalid data, valid coverage (1826) is not equal to the sum of canonical and modified counts (1825), [BedMethylLine { chrom: "Tharec", interval: Interval { start: 3603, stop: 3604, val: () }, raw_mod_code: Code('m'), strand: Positive, count_methylated: 505, valid_coverage: 1826, count_canonical: 1320, count_other: 1, count_delete: 119, count_fail: 0, count_diff: 33, count_nocall: 0 }] chrom: Tharec starting at 3603, stopping
> Error! invalid-bedmethyl-data
> 1 common sequence(s) between FASTA and both samples
> running single-site analysis
> using default prior, Beta(α: 0.55, β: 0.55)
> estimating max coverages from data
> sampled 0 a records and 0 b records, calculating max coverages for 95th percentile
> Error! not enough datapoints, got 0
I have the same issue with dmr pair as the last one above. I'm using the model inosine_m6A_2OmeA.
Error:
batch failed: invalid data, valid coverage (63) is not equal to the sum of canonical and modified counts (62), [BedMethylLine { chrom: "ENST00000005286", interval: Interval { start: 1339, stop: 1340, val: () }, raw_mod_code: Code('a'), strand: Positive, count_methylated: 0, valid_coverage: 63, count_canonical: 62, count_other: 1, count_delete: 0, count_fail: 6, count_diff: 0, count_nocall: 0 }, BedMethylLine { chrom: "ENST00000005286", interval: Interval { start: 1339, stop: 1340, val: () }, raw_mod_code: ChEbi(17596), strand: Positive, count_methylated: 0, valid_coverage: 63, count_canonical: 62, count_other: 1, count_delete: 0, count_fail: 6, count_diff: 0, count_nocall: 0 }] chrom ENST00000005286 starting at 1339, ending at 1340, stopping
The three BedMethyl lines that trigger the problem are:
ENST00000005286 1339 1340 a 63 + 1339 1340 255,0,0 63 0.00 0 62 1 0 6 0 0
ENST00000005286 1339 1340 17596 63 + 1339 1340 255,0,0 63 0.00 0 62 1 0 6 0 0
ENST00000005286 1339 1340 69426 63 + 1339 1340 255,0,0 63 1.59 1 62 0 0 6 0 0
@mnsmar @rania-o thanks for the info let me take a look.
Sorry for the delay @mnsmar and @rania-o, I've found the bug/problem.
You have two options. You can use the released Modkit with the following additional --assign-code arguments as in this example:
${modkit} dmr pair \
-a ${a_pileup} \
-b ${b_pileup} \
--base A \
--force \
--assign-code "69426:A" \
--assign-code "19228:C" \
--assign-code "19227:T" \
--assign-code "19229:G" \
--log ${outdir}/release.log \
--header \
-o ${outdir}/release_dmr.bed \
--ref ${ref}
Or I've attached a build (branch here) that has a fix.
For example:
${modkit} dmr pair \
-a ${a_pileup} \
-b ${b_pileup} \
--base A \
--force \
--log ${outdir}/dev.log \
-o ${outdir}/dev_dmr.bed \
--header \
--ref ${ref}
Thanks @ArtRand. I used the --assign-code option above and it has fixed the problem.
Hello @ArtRand,
Thanks for your reply. I also used the option --assign-code and it worked.