dmrseq
dmrseq copied to clipboard
What are the suitable parameters for RRBS data?
Hi,
In your paper you have mentioned that "dmrseq is generally applicable to WGBS data". My impression is that methods that work for WGBS also work for RRBS data. Have you tried dmrseq for RRBS data? If yes, what are the recommend parameters for dmrseq()
function?
Hi @bishwaG,
Another great question! The dmrseq procedure is certainly also applicable to data from RRBS experiments. However, the default parameters for region construction and smoothing (e.g. bpSpan
, minInSpan
, maxGapSmooth
, andmaxGap
) were set with WGBS data in mind. I haven't yet extensively explored RRBS analysis to have an official set of recommendations, so you might have to try a couple of different parameter sets on a subset of your data (e.g. a single chromosome) to see how it impacts the types of regions identified.
For example, you might want to try decreasing minInSpan
and increasing maxGapSmooth
to allow the smoothing procedure to span gaps in coverage where the RRBS didn't measure any CpGs. For the same reason, you might also increase maxGap
so that you allow two neighboring CpGs to be in the same DMR even if there more than 1000bp between them (since there could likely be CpGs in that span that just weren't targeted by RRBS).
Adjusting these parameters will likely affect the sizes of DMRs identified, but either way the procedure will still provide valid inference. Hope that helps! And feel free to report back if you happen to find a particular setting works well.
Best, Keegan
Hi @kdkorthauer
Thank you very much for your reply. I tried minInSpan=10
, bpSpan values between 5,000
to 10,000
, maxGapSmooth between 10,000
and 100,000
and maxGap=5000
. I was unable to get any significant (qval <= 0.05
) DMRs. I tried for couple of chromosomes, but no success. I do not expect my groups be that homogeneous so that I wont get any DMR. How does dmrseq perform if I disable smoothing? If I do not remember wrong it is recommends not to smooth RRBS (because it is very sparse compared to WGBS) when I use DSS package for differential mentylation of bisulfite sequencing data.
Hi @bishwaG,
Thanks for reporting back! It seems like you're not seeing much difference in performance when you increase smoothing. A few things come to mind:
-
What are you using for the
cutoff
parameter? It is 0.10 by default, but you may want to try lowering it to something like 0.05 if you're not seeing a strong signal. -
You could certainly try with no smoothing (
smooth = FALSE
). This might generate rather short regions, especially if your coverage is on the low end and the signal is a bit noisy. This is because longer regions will get broken up by short stretches of CpGs that don't exhibit signal (which is effectively smoothed over with the smoothing procedure). -
What type of covariate are you testing and how many replicates do you have? If you are using a dichotomous covariate (2 groups), are there any additional covariates that you may want to match on (such as a different covariate that would split the samples into two groups, where each group has some samples from each of the groups of the covariate of interest - see the documentation for the
matchCovariate
parameter).
Best, Keegan
Hi @kdkorthauer
Thank you for more insights. I have been using cutoff = 0.01
. I have following experimental design and I would like to find methylation different between group A and B by adjusting effect coming from handlingTime
. I have been using adjustCovariate = "handlingTime"
to adjust covariates.
Sample Group handlingTime
S1 A 100
S2 A 152
S3 A 452
S4 A 1258
S5 B 214
S6 B 352
S7 B 574
S8 B 214
Regards, BishwaG
Hi, I am wondering do you have any recommendations for RRBS now since the latest discussion? Thanks!
Hi @cauls19900319,
Thanks for your question. In general I have not found a specific set of smoothing parameters that performs best in all cases. I still recommend testing out a few different sets of parameters on a small subset of your data to see how they compare (e.g. no smoothing vs default smoothing). When comparing, you can plot the signal in the top-ranked regions, and see whether certain settings tend to find more 'convincing' DMRs by eye (or whether it looks like a longer DMR is being broken up into smaller DMRs, for example - this would suggest to increase smoothing).
In either case, the results still provide valid inference. The tuning will simply help to provide more accurate region boundaries.
Best, Keegan