methylKit
methylKit copied to clipboard
How to include batch effect
Hi again Altuna,
Finally I solved the problem with my big size files using methylDackel as you recommended (thanks again!). So I have been able to start using methylkit and obtained the general information such as basic statistics, correlation stats and PCA.
When analyzing the PCA plot, I can observed that control and treated samples are not as separated as I expected, and maybe this is related to a batch effect originated in the sequencing step where some samples have been sequenced one day and other samples in another day.
I have been reading other issues and questions about this problems in the methylkit_discussion group and have concluded that maybe the batch effect option (assocComp) is not the best way to proceed.
The other option is to add this information as a covariate. But in the user guide it is only applied when finding the differentially methylated bases or regions. Is there any posibility or script to apply the covariate option before that step and recalculate the PCA or sample clustering step, taking into account the covariate information, to see if the control and treated sample separation is more appropiate ??
Thanks in advance,
Iraia
I think you have to use assocComp and see if the sequencing batch effect string correlates with one of the PCs, try to remove that component. There is no function we provide other than those to remove batch effects
Best Altuna
On Tue, Dec 22, 2020 at 6:10 PM iramai [email protected] wrote:
Hi again Altuna, Finally I solved the problem with my big size files using methylDackel as you recommended (thanks again!). So I have been able to start using methylkit and obtained the general information such as basic statistics, correlation stats and PCA. When analyzing the PCA plot, I can observed that control and treated samples are not as separated as I expected, and maybe this is related to a batch effect originated in the sequencing step where some samples have been sequenced one day and other samples in another day. [image: PCA - copia] https://user-images.githubusercontent.com/46052804/102913588-42f7bf00-447f-11eb-8c80-45b66bcda78d.jpeg I have been reading other issues and questions about this problems in the methylkit_discussion group and have concluded that maybe the batch effect option (assocComp) is not the best way to proceed. The other option is to add this information as a covariate. But in the user guide it is only applied when finding the differentially methylated bases or regions. Is there any posibility or script to apply the covariate option before that step and recalculate the PCA or sample clustering step, taking into account the covariate information, to see if the control and treated sample separation is more appropiate ?? Thanks in advance, Iraia
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/al2na/methylKit/issues/219, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE32EOY4X36NDITXM4ZFMLSWDHJZANCNFSM4VF4YNBA .
-- Sent from mobile, excuse the brevity
Hi Altuna, Thanks for your fast answer. I have been trying with assocComp, but I don't know if I am using it properly. I am analyising 4 data sets, where first two are contro l samples and second two are treatment, so I have defined them as (treatment=c(0,0,1,1)) at the begining with methRead. Now I want to include the batch effect considering that first one couple of samples was sequenced one day and the other in another day, so:
sampleAnnotation=data.frame(batch_id=c("a","b","a","b")) sampleAnnotation batch_id 1 a 2 b 3 a 4 b
And finally assocComp to see the PCs:
as=assocComp(mBase=meth,sampleAnnotation) as $pcs PC1 PC2 PC3 PC4 C1 -0.5014335 0.2124935 -0.27363490 -0.79280193 C2 -0.5007956 0.2876573 -0.56531417 0.58896259 M1 -0.4977126 -0.8627174 0.06544428 0.06097381 M2 -0.5000503 0.3575170 0.77541013 0.14446579
$vars [1] 84.994069 5.248520 4.949609 4.807802
$association PC1 PC2 PC3 PC4 batch_id 1 0.3333333 1 0.3333333
Know I don't understand what you say about the correlation between batch effect string with one of the PCs, I get a little bit lost interpreting those results, so I don't know which PC remove. Can you help me with this? Thanks in advance, Iraia
It seems PCs are not strongly associated with the batch, you might have other covariates you don’t know about that might explain the PCA plot.
Best Altuna
On Wed, Dec 23, 2020 at 10:43 AM iramai [email protected] wrote:
Hi Altuna, Thanks for your fast answer. I have been trying with assocComp, but I don't know if I am using it properly. I am analyising 4 data sets, where first two are contro l samples and second two are treatment, so I have defined them as (treatment=c(0,0,1,1)) at the begining with methRead. Now I want to include the batch effect considering that first one couple of samples was sequenced one day and the other in another day, so:
sampleAnnotation=data.frame(batch_id=c("a","b","a","b")) sampleAnnotation batch_id 1 a 2 b 3 a 4 b
And finally assocComp to see the PCs:
as=assocComp(mBase=meth,sampleAnnotation) as $pcs PC1 PC2 PC3 PC4 C1 -0.5014335 0.2124935 -0.27363490 -0.79280193 C2 -0.5007956 0.2876573 -0.56531417 0.58896259 M1 -0.4977126 -0.8627174 0.06544428 0.06097381 M2 -0.5000503 0.3575170 0.77541013 0.14446579
$vars [1] 84.994069 5.248520 4.949609 4.807802
$association PC1 PC2 PC3 PC4 batch_id 1 0.3333333 1 0.3333333
Know I don't understand what you say about the correlation between batch effect string with one of the PCs, I get a little bit lost interpreting those results, so I don't know which PC remove. Can you help me with this? Thanks in advance, Iraia
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/al2na/methylKit/issues/219#issuecomment-750043482, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE32EKBCS246DNL2EDZAZDSWG3S5ANCNFSM4VF4YNBA .
-- Sent from mobile, excuse the brevity
So? do you know how can I solve that problem?
Check discussion forum, I have answered this question many times. short answer: there are possible routes to take, we don’t have any code for those possible routes
Best Altuna
On Wed, Dec 23, 2020 at 12:12 PM iramai [email protected] wrote:
So? do you know how can I solve that problem?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/al2na/methylKit/issues/219#issuecomment-750158331, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE32EPWTJQQ7YYBL7UARW3SWHGA5ANCNFSM4VF4YNBA .
-- Sent from mobile, excuse the brevity