methylKit icon indicating copy to clipboard operation
methylKit copied to clipboard

how to use the methylation data when to do boxplot ?

Open Xiangyi-Deng opened this issue 1 year ago • 4 comments

Hello Alex,

I have a small question about RRBS methylation data.

It is my question that I post before.

My question this time is how to use RRBS methylation data to do a boxplot based on CpG sites values.

You are analysing RRBS data comparing two groups with two replicates. You aggregated counts over* tiling windows*, which *sums up the methylated/unmethylated base counts over the *tilling windows accross genome (see code https://github.com/al2na/methylKit/blob/388770da40bde7d563224cc5d2259c2a7a6a2e08/R/regionalize.R#L121-L181), so methylation level per region would be the total CpG sites percentage (sum(freqC)/sum(Coverage)).

But if I do a boxplot, I should calculate the mean methylation level of one dmr which may contain many CpGs.

For example, below is a dmr which contains 6 CpG sites. But because here it shows us the methylation percentage not freqC.

chr1.836516 chr1 836516 F 4 100.00 0 chr1.836543 chr1 836543 F 4 0.00 100 chr1.836941 chr1 836941 F 19 68.42 31.58 chr1.836942 chr1 836942 R 1 100.00 0 chr1.836953 chr1 836953 F 19 78.95 21.05 chr1.836954 chr1 836954 R 1 100.00 0

So if I do boxplot for this dmr,

Mean Methylation Level (Group 1) = (100.00+0.00+68.42+100.00+78.95+100.00)/6 =74.56 or 0.7456

However, this value must be different compared to sum(freqC)/sum(Coverage) , right ?

For sum(freqC)/sum(Coverage) ,

it's :

(4 * 1 + 4* 0+ .... 1* 1)/(4+4+19+1+19+1)= 0.70833333

Do you think I can do boxplot this way if I want to show all CpGs methylation level ?

Can I ignore this difference in some degree.

Because I think it is the only way to show the methylation level by boxplot.

I am looking forward to your response.

image

​ Best regards, Xiangyi

Xiangyi-Deng avatar Aug 08 '24 15:08 Xiangyi-Deng