methylKit icon indicating copy to clipboard operation
methylKit copied to clipboard

Error when running calculateDiffMeth function

Open danicic7 opened this issue 4 years ago • 18 comments

Hi, I'm experiencing the following issue when running the calculateDiffMeth function: calculateDiffMeth(data_obj, mc.cores = 32 , overdispersion = "MN" , test = "Chisq") and I get the error: two groups detected: will calculate methylation difference as the difference of treatment (group: 1) - control (group: 0) Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases

I am using the methylKit version 1.10, but I get the same error when I run the same code on methylKit version 1.14. It also fails when running the code without specifying the test and/or overdispersion parameters ( calculateDiffMeth(data_obj, mc.cores = 32) ). The data_obj includes ~39 million positions, so I tried subsetting the data object I am running this function with. Some subsets fail with the same error when tested with methylKit 1.10, even though they are processed successfully with methylKit 1.14. However, the data object cannot be processed in its entirety with any methylKit version.

I saw some closed issues related to the same error, but this issue is expected to be fixed in the later versions. Any help is appreciated. I can share my code and data if it helps.

Thank you! Aleksandar

danicic7 avatar Jun 10 '20 16:06 danicic7

Hi @danicic7,

It would be great if you could provide us a reproducible example, such that we can test the error ourselves. Best would be if you could provide us a subset of your data that results in the error.

Best, Alex

alexg9010 avatar Jun 10 '20 18:06 alexg9010

Hello @alexg9010 ,

My name is Pooja Shah and I am colleague of @danicic7 . I can send you the whole data which includes ~39 million positions. @danicic7 tried to subset the data,but we don't see the error on smaller subset. Since its a company data generated for internal use, is there any other way I can directly send the data to you instead of sharing it on github?

Your help is highly appreciated.

Thank you, Pooja

pooja19862 avatar Jun 15 '20 22:06 pooja19862

Hi @pooja19862,

Depending on the size of your dataset, you may want upload it somewhere and send me the link via pm to alex.gos90[ at ]gmail[ . ] com .

Best, Alex

Sent from mobile.

pooja19862 [email protected] schrieb am Di., 16. Juni 2020, 00:13:

Hello @alexg9010 https://github.com/alexg9010 ,

My name is Pooja Shah and I am colleague of @danicic7 https://github.com/danicic7 . I can send you the whole data which includes ~39 million positions. @danicic7 https://github.com/danicic7 tried to subset the data,but we don't see the error on smaller subset. Since its a company data generated for internal use, is there any other way I can directly send the data to you instead of sharing it on github?

Your help is highly appreciated.

Thank you, Pooja

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/al2na/methylKit/issues/206#issuecomment-644417075, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADK7JD3P56E5QZKJW4P4UGLRW2MIDANCNFSM4N2QWKCQ .

alexg9010 avatar Jun 16 '20 09:06 alexg9010

Thanks for your reply. I just emailed you the dataset. Let me know if you have any trouble accessing it.

pooja19862 avatar Jun 16 '20 14:06 pooja19862

Hello @alexg9010 ,

Any update on this open ticket?

Thanks, Pooja

pooja19862 avatar Jun 23 '20 12:06 pooja19862

Hi @pooja19862 ,

I dowloaded the data and was able to reproduce your error, but I am still working on a way to figure out which rows are causing the issue, however with the size of your dataset this will still take some time.

One simple but general solution to your problem would be to not set the min.per.group=1L argument in the unite function, which would only consider bases that have been called by all of your samples. While this will lead to loss of some bases, the remaining ones are more reliable, also in terms of differential analysis.

Best, Alex

alexg9010 avatar Jun 23 '20 13:06 alexg9010

@pooja19862 please share the full code including the differential methylation call. This error occurs when all the groups or one of them are NA only. This wouldn't happen normally even if you use min.per.group=1L. you may have altered the treatment vector after a unite() operation.

@alexg9010 I would wait to see the code before I spend more time on it :)

al2na avatar Jun 23 '20 15:06 al2na

Hello @al2na @alexg9010

I have shared the whole code with you. No changes have been done to treatment vector after unite() operation. As shared in the code after unite() operation I have directly done calculateDiffMeth(). I used with and without overdispersion and gets the same error.

my_diff = calculateDiffMeth(data_obj, mc.cores = 16 , overdispersion = "MN" , test = "Chisq") my_diff = calculateDiffMeth(data_obj, mc.cores = 16)

pooja19862 avatar Jun 23 '20 18:06 pooja19862

@al2na he shared the code and files in private, I downloaded them already.

alexg9010 avatar Jun 23 '20 18:06 alexg9010

Hello, I'm encountering a similar problem. Is there any update on this error?

roshmisarma avatar Sep 08 '20 02:09 roshmisarma

Hi @roshmisarma ,

Unfortunately, I do not have an update on this issue yet. To mitigate the problem until a fix is there, please see me previous message:

One simple but general solution to your problem would be to not set the min.per.group=1L argument in the unite function, which would only consider bases that have been called by all of your samples. While this will lead to loss of some bases, the remaining ones are more reliable, also in terms of differential analysis.

Best, Alex

alexg9010 avatar Oct 15 '20 14:10 alexg9010

One simple but general solution to your problem would be to not set the min.per.group=1L argument in the unite function, which would only consider bases that have been called by all of your samples. While this will lead to loss of some bases, the remaining ones are more reliable, also in terms of differential analysis.

I have the opposite problem. I'm looking at region counts of a subset of genes I'm interested in.

So I bring in cov files, filter by Coverage, normalise by Coverage and then unite (min.per.group=1L). Then calculate region counts. This works -- I can reorganise and do a contrast within a subset etc. eg. main effect is population 1 vs population 2, then can do pop1-treat vs pop1-control.

The idea is that even though all sites are not represented there's enough within a region to make an estimate. However, there is a risk of low confidence in some of the samples because one sample could have only 1 CpG represented etc. So when I try to increase min.per.group that's when I start to get errors.

two groups detected:
 will calculate methylation difference as the difference of
treatment (group: 1) - control (group: 0)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  0 (non-NA) cases

I assume this is to do with too many NAs too?

r-mashoodh avatar Jan 29 '21 16:01 r-mashoodh

I did not set min.per.group when using unite, but I also encountered the same error when running calculateDiffMeth in with version 1.17.4 while also specifying a covariate:

two groups detected:
 will calculate methylation difference as the difference of
treatment (group: 1) - control (group: 0)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  0 (non-NA) cases

My script, which includes instructions to download data, can be found here. Because I did not set a min.per.group when running unite, I'm not sure if the issue is related to NAs in my dataset. Is there anything I should be doing differently?

yaaminiv avatar Mar 12 '21 00:03 yaaminiv

can you send a reproducible example, with the smallest dataset that reproduces the error. the link you sent contains all your script and data, we just need the subset of both data and script that reproduces the error.

On Fri, Mar 12, 2021 at 1:14 AM Yaamini Venkataraman < @.***> wrote:

I did not set min.per.group when using unite, but I also encountered the same error when running calculateDiffMeth in with version 1.17.4 while also specifying a covariate:

two groups detected: will calculate methylation difference as the difference of treatment (group: 1) - control (group: 0) Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases

My script, which includes instructions to download data, can be found here https://github.com/RobertsLab/project-gigas-oa-meth/blob/master/code/06-methylKit.Rmd#L200. Because I did not set a min.per.group when running unite, I'm not sure if the issue is related to NAs in my dataset. Is there anything I should be doing differently?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/al2na/methylKit/issues/206#issuecomment-797142113, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE32EOMZDPUVVX6GCETKUTTDFFEJANCNFSM4N2QWKCQ .

al2na avatar Mar 12 '21 06:03 al2na

I haven't figured out the smallest dataset that reproduces the error yet, and will send the data and code along when I do so.

However, I think the error is actually related to mc.cores. I tried differentialMethylationStatsTreatment <- methylKit::calculateDiffMeth(methylationInformationFilteredCov5, overdispersion = "MN", test = "Chisq", mc.cores = 2) and got the same lm.fit error.

I ran differentialMethylationStatsTreatment <- methylKit::calculateDiffMeth(methylationInformationFilteredCov5, overdispersion = "MN", test = "Chisq") without the lm.fit error.

Screen Shot 2021-03-12 at 1 18 37 AM

I am now running the code below on all of my samples and have not encountered an lm.fit error.

differentialMethylationStatsTreatment <- methylKit::calculateDiffMeth(methylationInformationFilteredCov5, covariates = covariateMetadata, overdispersion = "MN", test = "Chisq")

yaaminiv avatar Mar 17 '21 16:03 yaaminiv

I haven't figured out the smallest dataset that reproduces the error yet, and will send the data and code along when I do so.

However, I think the error is actually related to mc.cores. I tried differentialMethylationStatsTreatment <- methylKit::calculateDiffMeth(methylationInformationFilteredCov5, overdispersion = "MN", test = "Chisq", mc.cores = 2) and got the same lm.fit error.

I ran differentialMethylationStatsTreatment <- methylKit::calculateDiffMeth(methylationInformationFilteredCov5, overdispersion = "MN", test = "Chisq") without the lm.fit error.

Screen Shot 2021-03-12 at 1 18 37 AM

I am now running the code below on all of my samples and have not encountered an lm.fit error.

differentialMethylationStatsTreatment <- methylKit::calculateDiffMeth(methylationInformationFilteredCov5, covariates = covariateMetadata, overdispersion = "MN", test = "Chisq")

it's helpful, without set the mc.cores , no lm.fit error occurred

jadonWong avatar Jun 20 '21 08:06 jadonWong

Hello, I had the same issue which was resolved by removing the NAs from the dataset before running the diffmeth. I tested this by removing the NAs after running unite allowing for 0.75 overlap to retain loci. Posting just FYI.

subset.drop = getData(meth.united.db) %>% as.data.frame %>% dplyr::select(contains("num")) %>% apply(., 1, function(x){is.na(x) %>% any}) %>% which

RJDan avatar Nov 04 '22 10:11 RJDan