methylKit icon indicating copy to clipboard operation
methylKit copied to clipboard

How to merge the DMR within 100bp or other size?

Open yplee614 opened this issue 3 years ago • 5 comments

How to merge the DMR within 100bp or other size?

yplee614 avatar Nov 24 '20 08:11 yplee614

Hi @yplee614,

there is no function to merge dmcs to dmrs within a certain range, but you could look at the function described here: https://github.com/al2na/methylKit/issues/193 , which clusters dmcs into dmrs by the same unsupervised segmentation algorithm we use for methSeg().

Best, Alex

alexg9010 avatar Nov 24 '20 09:11 alexg9010

Hi Alex,

Could you add this function into the methykit package? Many thanks.

Best, yplee

yplee614 avatar Nov 24 '20 09:11 yplee614

I mean to merge two DMRs into one DMR, when they are with 100bp in genome. No merge dmcs to dmrs. Thank you.

yplee614 avatar Nov 24 '20 10:11 yplee614

Ah Okay,

there is also no function to merge dmrs within a certain range, but I guess one could do this with findOverlaps() and some checks to only merge dmrs with similar meth.diff .

@katwre do you maybe already have a function to do this?

alexg9010 avatar Nov 24 '20 10:11 alexg9010

hi @yplee614 and @alexg9010 , Sorry, I don't have a function for that. What I would do is:

  • convert your methylDiff object to GRanges your.granges.object=as(yourmethdiffobject, "GRanges")
  • use reduce function with min.gapwidth=100 to merge your DMRs that are in 100 bp proximity (my.red=reduce(your.granges.object, min.gapwidth=100))
  • if you'd like to aggregate average methylation differences (meth.diff column) or other meta columns that you are interested in from your DMRs that got merged in my.red, first use findOverlaps(your.granges.object, my.red) and then stats::aggregate function (or any other function to aggregate a column by a group from e.g. dplyr or data.table R packages)

Hope it helps, Kasia

katwre avatar Nov 24 '20 10:11 katwre

hi @yplee614 and @alexg9010 , Sorry, I don't have a function for that. What I would do is:

  • convert your methylDiff object to GRanges your.granges.object=as(yourmethdiffobject, "GRanges")
  • use reduce function with min.gapwidth=100 to merge your DMRs that are in 100 bp proximity (my.red=reduce(your.granges.object, min.gapwidth=100))
  • if you'd like to aggregate average methylation differences (meth.diff column) or other meta columns that you are interested in from your DMRs that got merged in my.red, first use findOverlaps(your.granges.object, my.red) and then stats::aggregate function (or any other function to aggregate a column by a group from e.g. dplyr or data.table R packages)

Hope it helps, Kasia

Thank you for your great contribution. I have tried the method you metioned above. However, I got the my.red without the information of differential methylation which is contained in your.granges.object. Could you please tell me how to solve it. Thank you very much!

Biomamba avatar Mar 22 '23 06:03 Biomamba

Hi @Biomamba,

Kasia's third point outlines how to get this done:

overlaps <- findOverlaps(your.granges.object, my.red)
signal <- your.granges.object[overlaps@queryHits]
my.red$mean.meth.diff <- aggregate(your.granges.object$meth.diff, list(overlaps@subjectHits), mean)

Best, Alex

alexg9010 avatar Apr 12 '23 19:04 alexg9010

If this help, I created a R function that merged overlapping DMRs and average methyl.diff values to return a tibble in bed format. You can also tweak it to get a GRanges object at the end (just add the methyl.diff value in a metacolumn).

merge_DMRs <- function(methylDiff_object){
  require("valr")
  require("tidyr")
  require("GenomicRanges")
  
  # Convert methylDiff into GRange
  gr <- as(methylDiff_object, "GRanges")
  
  # Convert GR to dataframe in bed format
  # Rename seqnames to chrom and convert to char otherwise error from valr  
  # Note Grange is 1-based and bed is 0-based so one needs to subtract 1 to start position
  df <- data.frame(chrom=as.character(seqnames(gr)),
                   start=as.integer(start(gr)-1),
                   end=as.integer(end(gr)),
                   name=c(rep(".", length(gr))),
                   score=as.double(gr$meth.diff))
  
  # Merge first DMRs and keep average methyl.diff value.
  df_merged <- df %>% valr::bed_merge(score=mean(score)) 
  
  # Now add name to each DMR
  len <- dim(df_merged)[[1]]
  name_DMRs <- paste("DMR", seq(1, len, 1), sep="")
  
  df_merged$name <- name_DMRs
  
  # Reorder columns
  df_merged_ordered <-   df_merged %>% 
    dplyr::select(chrom, start, end, name, score)
  
  return(df_merged_ordered)
}

johanzi avatar Aug 10 '23 09:08 johanzi