methylKit
methylKit copied to clipboard
Error when running calculateDiffMeth function
Hi,
I'm experiencing the following issue when running the calculateDiffMeth function:
calculateDiffMeth(data_obj, mc.cores = 32 , overdispersion = "MN" , test = "Chisq")
and I get the error:
two groups detected: will calculate methylation difference as the difference of treatment (group: 1) - control (group: 0) Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases
I am using the methylKit version 1.10, but I get the same error when I run the same code on methylKit version 1.14. It also fails when running the code without specifying the test and/or overdispersion parameters ( calculateDiffMeth(data_obj, mc.cores = 32)
).
The data_obj includes ~39 million positions, so I tried subsetting the data object I am running this function with. Some subsets fail with the same error when tested with methylKit 1.10, even though they are processed successfully with methylKit 1.14.
However, the data object cannot be processed in its entirety with any methylKit version.
I saw some closed issues related to the same error, but this issue is expected to be fixed in the later versions. Any help is appreciated. I can share my code and data if it helps.
Thank you! Aleksandar
Hi @danicic7,
It would be great if you could provide us a reproducible example, such that we can test the error ourselves. Best would be if you could provide us a subset of your data that results in the error.
Best, Alex
Hello @alexg9010 ,
My name is Pooja Shah and I am colleague of @danicic7 . I can send you the whole data which includes ~39 million positions. @danicic7 tried to subset the data,but we don't see the error on smaller subset. Since its a company data generated for internal use, is there any other way I can directly send the data to you instead of sharing it on github?
Your help is highly appreciated.
Thank you, Pooja
Hi @pooja19862,
Depending on the size of your dataset, you may want upload it somewhere and send me the link via pm to alex.gos90[ at ]gmail[ . ] com .
Best, Alex
Sent from mobile.
pooja19862 [email protected] schrieb am Di., 16. Juni 2020, 00:13:
Hello @alexg9010 https://github.com/alexg9010 ,
My name is Pooja Shah and I am colleague of @danicic7 https://github.com/danicic7 . I can send you the whole data which includes ~39 million positions. @danicic7 https://github.com/danicic7 tried to subset the data,but we don't see the error on smaller subset. Since its a company data generated for internal use, is there any other way I can directly send the data to you instead of sharing it on github?
Your help is highly appreciated.
Thank you, Pooja
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/al2na/methylKit/issues/206#issuecomment-644417075, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADK7JD3P56E5QZKJW4P4UGLRW2MIDANCNFSM4N2QWKCQ .
Thanks for your reply. I just emailed you the dataset. Let me know if you have any trouble accessing it.
Hello @alexg9010 ,
Any update on this open ticket?
Thanks, Pooja
Hi @pooja19862 ,
I dowloaded the data and was able to reproduce your error, but I am still working on a way to figure out which rows are causing the issue, however with the size of your dataset this will still take some time.
One simple but general solution to your problem would be to not set the min.per.group=1L
argument in the unite function, which would only consider bases that have been called by all of your samples. While this will lead to loss of some bases, the remaining ones are more reliable, also in terms of differential analysis.
Best, Alex
@pooja19862 please share the full code including the differential methylation call. This error occurs when all the groups or one of them are NA only. This wouldn't happen normally even if you use min.per.group=1L. you may have altered the treatment vector after a unite() operation.
@alexg9010 I would wait to see the code before I spend more time on it :)
Hello @al2na @alexg9010
I have shared the whole code with you. No changes have been done to treatment vector after unite() operation. As shared in the code after unite() operation I have directly done calculateDiffMeth(). I used with and without overdispersion and gets the same error.
my_diff = calculateDiffMeth(data_obj, mc.cores = 16 , overdispersion = "MN" , test = "Chisq") my_diff = calculateDiffMeth(data_obj, mc.cores = 16)
@al2na he shared the code and files in private, I downloaded them already.
Hello, I'm encountering a similar problem. Is there any update on this error?
Hi @roshmisarma ,
Unfortunately, I do not have an update on this issue yet. To mitigate the problem until a fix is there, please see me previous message:
One simple but general solution to your problem would be to not set the
min.per.group=1L
argument in the unite function, which would only consider bases that have been called by all of your samples. While this will lead to loss of some bases, the remaining ones are more reliable, also in terms of differential analysis.
Best, Alex
One simple but general solution to your problem would be to not set the
min.per.group=1L
argument in the unite function, which would only consider bases that have been called by all of your samples. While this will lead to loss of some bases, the remaining ones are more reliable, also in terms of differential analysis.
I have the opposite problem. I'm looking at region counts of a subset of genes I'm interested in.
So I bring in cov files, filter by Coverage, normalise by Coverage and then unite (min.per.group=1L). Then calculate region counts. This works -- I can reorganise and do a contrast within a subset etc. eg. main effect is population 1 vs population 2, then can do pop1-treat vs pop1-control.
The idea is that even though all sites are not represented there's enough within a region to make an estimate. However, there is a risk of low confidence in some of the samples because one sample could have only 1 CpG represented etc. So when I try to increase min.per.group that's when I start to get errors.
two groups detected:
will calculate methylation difference as the difference of
treatment (group: 1) - control (group: 0)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
I assume this is to do with too many NAs too?
I did not set min.per.group
when using unite
, but I also encountered the same error when running calculateDiffMeth
in with version 1.17.4 while also specifying a covariate:
two groups detected:
will calculate methylation difference as the difference of
treatment (group: 1) - control (group: 0)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
My script, which includes instructions to download data, can be found here. Because I did not set a min.per.group
when running unite
, I'm not sure if the issue is related to NAs in my dataset. Is there anything I should be doing differently?
can you send a reproducible example, with the smallest dataset that reproduces the error. the link you sent contains all your script and data, we just need the subset of both data and script that reproduces the error.
On Fri, Mar 12, 2021 at 1:14 AM Yaamini Venkataraman < @.***> wrote:
I did not set min.per.group when using unite, but I also encountered the same error when running calculateDiffMeth in with version 1.17.4 while also specifying a covariate:
two groups detected: will calculate methylation difference as the difference of treatment (group: 1) - control (group: 0) Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases
My script, which includes instructions to download data, can be found here https://github.com/RobertsLab/project-gigas-oa-meth/blob/master/code/06-methylKit.Rmd#L200. Because I did not set a min.per.group when running unite, I'm not sure if the issue is related to NAs in my dataset. Is there anything I should be doing differently?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/al2na/methylKit/issues/206#issuecomment-797142113, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE32EOMZDPUVVX6GCETKUTTDFFEJANCNFSM4N2QWKCQ .
I haven't figured out the smallest dataset that reproduces the error yet, and will send the data and code along when I do so.
However, I think the error is actually related to mc.cores
. I tried differentialMethylationStatsTreatment <- methylKit::calculateDiffMeth(methylationInformationFilteredCov5, overdispersion = "MN", test = "Chisq", mc.cores = 2)
and got the same lm.fit error.
I ran differentialMethylationStatsTreatment <- methylKit::calculateDiffMeth(methylationInformationFilteredCov5, overdispersion = "MN", test = "Chisq")
without the lm.fit error.
data:image/s3,"s3://crabby-images/9f8b5/9f8b5e60f6278a5cb5560099eb9125911b1cf335" alt="Screen Shot 2021-03-12 at 1 18 37 AM"
I am now running the code below on all of my samples and have not encountered an lm.fit error.
differentialMethylationStatsTreatment <- methylKit::calculateDiffMeth(methylationInformationFilteredCov5, covariates = covariateMetadata, overdispersion = "MN", test = "Chisq")
I haven't figured out the smallest dataset that reproduces the error yet, and will send the data and code along when I do so.
However, I think the error is actually related to
mc.cores
. I trieddifferentialMethylationStatsTreatment <- methylKit::calculateDiffMeth(methylationInformationFilteredCov5, overdispersion = "MN", test = "Chisq", mc.cores = 2)
and got the same lm.fit error.I ran
differentialMethylationStatsTreatment <- methylKit::calculateDiffMeth(methylationInformationFilteredCov5, overdispersion = "MN", test = "Chisq")
without the lm.fit error.![]()
I am now running the code below on all of my samples and have not encountered an lm.fit error.
differentialMethylationStatsTreatment <- methylKit::calculateDiffMeth(methylationInformationFilteredCov5, covariates = covariateMetadata, overdispersion = "MN", test = "Chisq")
it's helpful, without set the mc.cores
, no lm.fit error occurred
Hello, I had the same issue which was resolved by removing the NAs from the dataset before running the diffmeth. I tested this by removing the NAs after running unite allowing for 0.75 overlap to retain loci. Posting just FYI.
subset.drop = getData(meth.united.db) %>% as.data.frame %>% dplyr::select(contains("num")) %>% apply(., 1, function(x){is.na(x) %>% any}) %>% which