modkit icon indicating copy to clipboard operation
modkit copied to clipboard

No results with the new RNA model

Open rania-o opened this issue 6 months ago • 10 comments

Hello,

I recently tried the new model released by dorado v1.0.0 for direct RNA, everthing works good except that I don't have any results with modkit dmr pair, which surprised me since I already had results with the same samples basecalled with dorado v0.0.9. Is modkit compatible with the new models of dorado ?

Thanks, Rania

rania-o avatar Jun 16 '25 08:06 rania-o

Hello @rania-o,

Could you tell me which version of Modkit you're using with Dorado v1.0.0?

ArtRand avatar Jun 16 '25 19:06 ArtRand

Hello @ArtRand

I'm using modkit/0.4.0, should I use another version ?

rania-o avatar Jun 17 '25 07:06 rania-o

Hello @rania-o,

Yes, the modified base models in Dorado v1.0 require you to use either --modified-bases-threshold 0.0 whilst basecalling/modification calling or Modkit v0.5.0. The reason is that the false-positive rate in the latest models is quite a bit lower and the filter threshold may be estimated to be 1.0 (you could confirm this in the pileup logs). If the threshold is estimated to be 1.0, all of the modification calls will be filtered out. Modkit 0.5.0 has a shim to keep this from happening.

ArtRand avatar Jun 17 '25 14:06 ArtRand

Hi @ArtRand

Thanks for your answer. You're right, the threshold for G was 1, but for the other bases/modifications the threshold was 0.7 or 0.8 as for the precedent analysis. Actually I just installed modkit V0.5.0 and will rerun the analysis. For modkit v0.5.0 no need to specify the modified bases threshold to 0, right ?

rania-o avatar Jun 18 '25 09:06 rania-o

Actually, I just tried the new version of modkit, with the --mod-threshlod mod:0.0, and I got this warning while running modkit pileup: Threshold of 0 for mod code a is very low. Consider increasing the filter-percentile or specifying a higher threshold.

Then I run modkit dmr pair with the three replicates and I got empty files, with these errors/ warning:

> batch failed: invalid data, valid coverage (1826) is not equal to the sum of canonical and modified counts (1825), [BedMethylLine { chrom: "Tharec", interval: Interval { start: 3603, stop: 3604, val: () }, raw_mod_code: Code('m'), strand: Positive, count_methylated: 505, valid_coverage: 1826, count_canonical: 1320, count_other: 1, count_delete: 119, count_fail: 0, count_diff: 33, count_nocall: 0 }] chrom: Tharec starting at 3603, stopping
> Error! invalid-bedmethyl-data

> 1 common sequence(s) between FASTA and both samples
> running single-site analysis
> using default prior, Beta(α: 0.55, β: 0.55)
> estimating max coverages from data
> sampled 0 a records and 0 b records, calculating max coverages for 95th percentile
> Error! not enough datapoints, got 0

rania-o avatar Jun 19 '25 14:06 rania-o

I have the same issue with dmr pair as the last one above. I'm using the model inosine_m6A_2OmeA.

Error:

batch failed: invalid data, valid coverage (63) is not equal to the sum of canonical and modified counts (62), [BedMethylLine { chrom: "ENST00000005286", interval: Interval { start: 1339, stop: 1340, val: () }, raw_mod_code: Code('a'), strand: Positive, count_methylated: 0, valid_coverage: 63, count_canonical: 62, count_other: 1, count_delete: 0, count_fail: 6, count_diff: 0, count_nocall: 0 }, BedMethylLine { chrom: "ENST00000005286", interval: Interval { start: 1339, stop: 1340, val: () }, raw_mod_code: ChEbi(17596), strand: Positive, count_methylated: 0, valid_coverage: 63, count_canonical: 62, count_other: 1, count_delete: 0, count_fail: 6, count_diff: 0, count_nocall: 0 }] chrom ENST00000005286 starting at 1339, ending at 1340, stopping

The three BedMethyl lines that trigger the problem are:

ENST00000005286 1339    1340    a       63      +       1339    1340    255,0,0 63      0.00    0       62      1       0       6       0       0
ENST00000005286 1339    1340    17596   63      +       1339    1340    255,0,0 63      0.00    0       62      1       0       6       0       0
ENST00000005286 1339    1340    69426   63      +       1339    1340    255,0,0 63      1.59    1       62      0       0       6       0       0

mnsmar avatar Jun 20 '25 22:06 mnsmar

@mnsmar @rania-o thanks for the info let me take a look.

ArtRand avatar Jun 24 '25 22:06 ArtRand

Sorry for the delay @mnsmar and @rania-o, I've found the bug/problem.

You have two options. You can use the released Modkit with the following additional --assign-code arguments as in this example:

${modkit} dmr pair \
  -a ${a_pileup} \
  -b ${b_pileup} \
  --base A \
  --force \
  --assign-code "69426:A" \
  --assign-code "19228:C" \
  --assign-code "19227:T" \
  --assign-code "19229:G" \
  --log ${outdir}/release.log \
  --header \
  -o ${outdir}/release_dmr.bed \
  --ref ${ref}

Or I've attached a build (branch here) that has a fix.

For example:

${modkit} dmr pair \
  -a ${a_pileup} \
  -b ${b_pileup} \
  --base A \
  --force \
  --log ${outdir}/dev.log \
  -o ${outdir}/dev_dmr.bed \
  --header \
  --ref ${ref}

modkit_dev3c17b9c_u16_x86_64.tar.gz

ArtRand avatar Jun 25 '25 01:06 ArtRand

Thanks @ArtRand. I used the --assign-code option above and it has fixed the problem.

mnsmar avatar Jun 26 '25 21:06 mnsmar

Hello @ArtRand,

Thanks for your reply. I also used the option --assign-code and it worked.

rania-o avatar Jun 30 '25 11:06 rania-o