modkit icon indicating copy to clipboard operation
modkit copied to clipboard

Differential methylation call using replicate samples

Open demis001 opened this issue 9 months ago • 7 comments

I want to call differential methylation regions (DMRs) between wild and captive groups. This is my first time using ONT sequencing data. I want to call the DMR without passing a pre-defined genomic region like CpG Island. Do you know any other tool that provides downstream analysis after Dorado modBam?

I tried this using modkit, it requires a pre-defind genomic region:

modkit dmr multi --min-valid-coverage 2 --ref ${REF} -s "$PILEUP_DIR/S192_merged_methylation.bed.gz" wild -s "$PILEUP_DIR/S442_merged_methylation.bed.gz" wild -s "$PILEUP_DIR/N150_mod_methylation.bed.gz" wild -s "$PILEUP_DIR/GM_merged_methylation.bed.gz" captive -s "$PILEUP_DIR/Mimi_merged_methylation.bed.gz" captive -s "$PILEUP_DIR/NZ_merged_methylation.bed.gz" captive --base C -o ${DMR_DIR} --prefix dmr_results -t ${THREADS} --log-filepath ${DMR_DIR}/dmr_call.log --force

The DMR has to be called using a sliding window, 3 min CpG, a window size of 300bp, etc. This is how we do it using the Illumina methylation route.

@demis001

demis001 avatar Mar 05 '25 17:03 demis001

Hello @demis001,

Sorry for the very delayed response, I've been traveling the last week or so.

If you want to use the windowing strategy you've described, you'll need to generate the windows yourself and pass them to the --regions-bed option. Alternatively, you can use the --segment <segments_fp> option to perform de novo segmentation, here is the link to documentation for how this works.

ArtRand avatar Mar 13 '25 16:03 ArtRand

@ArtRand

I am confused by this --segment <segments_fp>. What passes as segments_fp. That had to be dynamically handled based on sliding windows.

@demis001

demis001 avatar Mar 13 '25 16:03 demis001

Hello @demis001,

The argument to --segment is a path to the file (to be created) where the segmentation results are written. The rest of the output and options remain the same.

That had to be dynamically handled based on sliding windows.

Sorry I'm not quite sure what you mean. If I wasn't clear before, using (user-provided) sliding windows is an alternative to using --segment.

ArtRand avatar Mar 13 '25 22:03 ArtRand

@ArtRand

Sorry, I have an additional question:

modkit dmr pair -a ${BASECALL_DIR}/wild_final_merged_sorted.bed.gz \
    -b  ${BASECALL_DIR}/captive_final_merged_sorted.bed.gz \
    -o ${BASECALL_DIR}/DMR_wild_captive.bed --ref ${REF} --base C \
    --threads 16 --log-filepath ${BASECALL_DIR}/dmr_call.log

This is what I got; it doesn't make sense. I am used to a Differently Methylated Region(DMR )and a Differently Methylated C (DMC).

The output from the above command is DMC (each below line is an individual CpG position), not DMR:

NC_003426.1	79	80	.	1.824526444397634	+	m:0	307	m:4	428	m:0.00	m:0.93	0	0.009345794	1	0
NC_003426.1	90	91	.	1.2729640271502376	+	m:0	278	m:3	390	m:0.00	m:0.77	0	0.0076923077	1	0
NC_003426.1	93	94	.	1.2766724334062474	+	m:0	314	m:3	439	m:0.00	m:0.68	0	0.006833713	1	0
NC_003426.1	133	134	.	0.21558986482659748	+	m:1	301	m:4	418	m:0.33	m:0.96	0.003322259	0.009569378	1	0

What should I do to get a DMR? I'm sorry, but I'm not used to ONP data; this is my first time.

A DMR should be at least 3 CpG within some window.

Best @demis001

demis001 avatar Mar 19 '25 17:03 demis001

I passed --segment my_segment.bed to the above command. If I am not mistaken, this file documented the DMR, and the -o out.bed is the DMC. Please confirm.

demis001 avatar Mar 19 '25 17:03 demis001

Got something like this in the segment.bed file:

C_003426.1	79	16699	same	211.87825188506395	427	m:379	m:1597	m:0.23	m:0.67	0.0022896705	0.0066526146	-0.004362944
NW_025576300.1	24	39693	same	4.943667716419441	923	m:5323	m:18141	m:88.94	m:90.39	0.8893902	0.9038864	-0.014496207
NW_025576298.1	197	14950	same	181.51079077087343	637	m:61875	m:138519	m:92.15	m:94.36	0.9215406	0.94356424	-0.022023618

Please explain what is coded in each column or direct me to the documentation. I expected "Hyper or Hypo" in column 4

demis001 avatar Mar 19 '25 18:03 demis001

Hello @demis001

I passed --segment my_segment.bed to the above command. If I am not mistaken, this file documented the DMR, and the -o out.bed is the DMC. Please confirm.

That's correct.

Please explain what is coded in each column or direct me to the documentation. I expected "Hyper or Hypo" in column 4

The schema can be found in the documentation. same means that the two conditions have are predicted to not be differentially methylated and different is as you would expect - the opposite. You could interpret these as Hypo/Hyper based on the effect size (column 13) and which sample is the "control".

ArtRand avatar Mar 26 '25 00:03 ArtRand