Differential methylation call using replicate samples
I want to call differential methylation regions (DMRs) between wild and captive groups. This is my first time using ONT sequencing data. I want to call the DMR without passing a pre-defined genomic region like CpG Island. Do you know any other tool that provides downstream analysis after Dorado modBam?
I tried this using modkit, it requires a pre-defind genomic region:
modkit dmr multi --min-valid-coverage 2 --ref ${REF} -s "$PILEUP_DIR/S192_merged_methylation.bed.gz" wild -s "$PILEUP_DIR/S442_merged_methylation.bed.gz" wild -s "$PILEUP_DIR/N150_mod_methylation.bed.gz" wild -s "$PILEUP_DIR/GM_merged_methylation.bed.gz" captive -s "$PILEUP_DIR/Mimi_merged_methylation.bed.gz" captive -s "$PILEUP_DIR/NZ_merged_methylation.bed.gz" captive --base C -o ${DMR_DIR} --prefix dmr_results -t ${THREADS} --log-filepath ${DMR_DIR}/dmr_call.log --force
The DMR has to be called using a sliding window, 3 min CpG, a window size of 300bp, etc. This is how we do it using the Illumina methylation route.
@demis001
Hello @demis001,
Sorry for the very delayed response, I've been traveling the last week or so.
If you want to use the windowing strategy you've described, you'll need to generate the windows yourself and pass them to the --regions-bed option. Alternatively, you can use the --segment <segments_fp> option to perform de novo segmentation, here is the link to documentation for how this works.
@ArtRand
I am confused by this --segment <segments_fp>. What passes as segments_fp. That had to be dynamically handled based on sliding windows.
@demis001
Hello @demis001,
The argument to --segment is a path to the file (to be created) where the segmentation results are written. The rest of the output and options remain the same.
That had to be dynamically handled based on sliding windows.
Sorry I'm not quite sure what you mean. If I wasn't clear before, using (user-provided) sliding windows is an alternative to using --segment.
@ArtRand
Sorry, I have an additional question:
modkit dmr pair -a ${BASECALL_DIR}/wild_final_merged_sorted.bed.gz \
-b ${BASECALL_DIR}/captive_final_merged_sorted.bed.gz \
-o ${BASECALL_DIR}/DMR_wild_captive.bed --ref ${REF} --base C \
--threads 16 --log-filepath ${BASECALL_DIR}/dmr_call.log
This is what I got; it doesn't make sense. I am used to a Differently Methylated Region(DMR )and a Differently Methylated C (DMC).
The output from the above command is DMC (each below line is an individual CpG position), not DMR:
NC_003426.1 79 80 . 1.824526444397634 + m:0 307 m:4 428 m:0.00 m:0.93 0 0.009345794 1 0
NC_003426.1 90 91 . 1.2729640271502376 + m:0 278 m:3 390 m:0.00 m:0.77 0 0.0076923077 1 0
NC_003426.1 93 94 . 1.2766724334062474 + m:0 314 m:3 439 m:0.00 m:0.68 0 0.006833713 1 0
NC_003426.1 133 134 . 0.21558986482659748 + m:1 301 m:4 418 m:0.33 m:0.96 0.003322259 0.009569378 1 0
What should I do to get a DMR? I'm sorry, but I'm not used to ONP data; this is my first time.
A DMR should be at least 3 CpG within some window.
Best @demis001
I passed --segment my_segment.bed to the above command. If I am not mistaken, this file documented the DMR, and the -o out.bed is the DMC. Please confirm.
Got something like this in the segment.bed file:
C_003426.1 79 16699 same 211.87825188506395 427 m:379 m:1597 m:0.23 m:0.67 0.0022896705 0.0066526146 -0.004362944
NW_025576300.1 24 39693 same 4.943667716419441 923 m:5323 m:18141 m:88.94 m:90.39 0.8893902 0.9038864 -0.014496207
NW_025576298.1 197 14950 same 181.51079077087343 637 m:61875 m:138519 m:92.15 m:94.36 0.9215406 0.94356424 -0.022023618
Please explain what is coded in each column or direct me to the documentation. I expected "Hyper or Hypo" in column 4
Hello @demis001
I passed --segment my_segment.bed to the above command. If I am not mistaken, this file documented the DMR, and the -o out.bed is the DMC. Please confirm.
That's correct.
Please explain what is coded in each column or direct me to the documentation. I expected "Hyper or Hypo" in column 4
The schema can be found in the documentation. same means that the two conditions have are predicted to not be differentially methylated and different is as you would expect - the opposite. You could interpret these as Hypo/Hyper based on the effect size (column 13) and which sample is the "control".