methylKit icon indicating copy to clipboard operation
methylKit copied to clipboard

How to include batch effect

Open iramai opened this issue 3 years ago • 5 comments

Hi again Altuna, Finally I solved the problem with my big size files using methylDackel as you recommended (thanks again!). So I have been able to start using methylkit and obtained the general information such as basic statistics, correlation stats and PCA. When analyzing the PCA plot, I can observed that control and treated samples are not as separated as I expected, and maybe this is related to a batch effect originated in the sequencing step where some samples have been sequenced one day and other samples in another day. PCA - copia I have been reading other issues and questions about this problems in the methylkit_discussion group and have concluded that maybe the batch effect option (assocComp) is not the best way to proceed. The other option is to add this information as a covariate. But in the user guide it is only applied when finding the differentially methylated bases or regions. Is there any posibility or script to apply the covariate option before that step and recalculate the PCA or sample clustering step, taking into account the covariate information, to see if the control and treated sample separation is more appropiate ?? Thanks in advance, Iraia

iramai avatar Dec 22 '20 17:12 iramai

I think you have to use assocComp and see if the sequencing batch effect string correlates with one of the PCs, try to remove that component. There is no function we provide other than those to remove batch effects

Best Altuna

On Tue, Dec 22, 2020 at 6:10 PM iramai [email protected] wrote:

Hi again Altuna, Finally I solved the problem with my big size files using methylDackel as you recommended (thanks again!). So I have been able to start using methylkit and obtained the general information such as basic statistics, correlation stats and PCA. When analyzing the PCA plot, I can observed that control and treated samples are not as separated as I expected, and maybe this is related to a batch effect originated in the sequencing step where some samples have been sequenced one day and other samples in another day. [image: PCA - copia] https://user-images.githubusercontent.com/46052804/102913588-42f7bf00-447f-11eb-8c80-45b66bcda78d.jpeg I have been reading other issues and questions about this problems in the methylkit_discussion group and have concluded that maybe the batch effect option (assocComp) is not the best way to proceed. The other option is to add this information as a covariate. But in the user guide it is only applied when finding the differentially methylated bases or regions. Is there any posibility or script to apply the covariate option before that step and recalculate the PCA or sample clustering step, taking into account the covariate information, to see if the control and treated sample separation is more appropiate ?? Thanks in advance, Iraia

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/al2na/methylKit/issues/219, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE32EOY4X36NDITXM4ZFMLSWDHJZANCNFSM4VF4YNBA .

-- Sent from mobile, excuse the brevity

al2na avatar Dec 23 '20 08:12 al2na

Hi Altuna, Thanks for your fast answer. I have been trying with assocComp, but I don't know if I am using it properly. I am analyising 4 data sets, where first two are contro l samples and second two are treatment, so I have defined them as (treatment=c(0,0,1,1)) at the begining with methRead. Now I want to include the batch effect considering that first one couple of samples was sequenced one day and the other in another day, so:

sampleAnnotation=data.frame(batch_id=c("a","b","a","b")) sampleAnnotation batch_id 1 a 2 b 3 a 4 b

And finally assocComp to see the PCs:

as=assocComp(mBase=meth,sampleAnnotation) as $pcs PC1 PC2 PC3 PC4 C1 -0.5014335 0.2124935 -0.27363490 -0.79280193 C2 -0.5007956 0.2876573 -0.56531417 0.58896259 M1 -0.4977126 -0.8627174 0.06544428 0.06097381 M2 -0.5000503 0.3575170 0.77541013 0.14446579

$vars [1] 84.994069 5.248520 4.949609 4.807802

$association PC1 PC2 PC3 PC4 batch_id 1 0.3333333 1 0.3333333

Know I don't understand what you say about the correlation between batch effect string with one of the PCs, I get a little bit lost interpreting those results, so I don't know which PC remove. Can you help me with this? Thanks in advance, Iraia

iramai avatar Dec 23 '20 09:12 iramai

It seems PCs are not strongly associated with the batch, you might have other covariates you don’t know about that might explain the PCA plot.

Best Altuna

On Wed, Dec 23, 2020 at 10:43 AM iramai [email protected] wrote:

Hi Altuna, Thanks for your fast answer. I have been trying with assocComp, but I don't know if I am using it properly. I am analyising 4 data sets, where first two are contro l samples and second two are treatment, so I have defined them as (treatment=c(0,0,1,1)) at the begining with methRead. Now I want to include the batch effect considering that first one couple of samples was sequenced one day and the other in another day, so:

sampleAnnotation=data.frame(batch_id=c("a","b","a","b")) sampleAnnotation batch_id 1 a 2 b 3 a 4 b

And finally assocComp to see the PCs:

as=assocComp(mBase=meth,sampleAnnotation) as $pcs PC1 PC2 PC3 PC4 C1 -0.5014335 0.2124935 -0.27363490 -0.79280193 C2 -0.5007956 0.2876573 -0.56531417 0.58896259 M1 -0.4977126 -0.8627174 0.06544428 0.06097381 M2 -0.5000503 0.3575170 0.77541013 0.14446579

$vars [1] 84.994069 5.248520 4.949609 4.807802

$association PC1 PC2 PC3 PC4 batch_id 1 0.3333333 1 0.3333333

Know I don't understand what you say about the correlation between batch effect string with one of the PCs, I get a little bit lost interpreting those results, so I don't know which PC remove. Can you help me with this? Thanks in advance, Iraia

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/al2na/methylKit/issues/219#issuecomment-750043482, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE32EKBCS246DNL2EDZAZDSWG3S5ANCNFSM4VF4YNBA .

-- Sent from mobile, excuse the brevity

al2na avatar Dec 23 '20 10:12 al2na

So? do you know how can I solve that problem?

iramai avatar Dec 23 '20 11:12 iramai

Check discussion forum, I have answered this question many times. short answer: there are possible routes to take, we don’t have any code for those possible routes

Best Altuna

On Wed, Dec 23, 2020 at 12:12 PM iramai [email protected] wrote:

So? do you know how can I solve that problem?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/al2na/methylKit/issues/219#issuecomment-750158331, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE32EPWTJQQ7YYBL7UARW3SWHGA5ANCNFSM4VF4YNBA .

-- Sent from mobile, excuse the brevity

al2na avatar Dec 23 '20 11:12 al2na