mtag icon indicating copy to clipboard operation
mtag copied to clipboard

Memory Error with many traits

Open Robbie90 opened this issue 6 years ago • 3 comments

Hi I'm trying to run MTAG on 93 traits simultaneously...

After Munging and Merging the log was the following:

..... 2018/10/02/02:28:26 PM Dropped 0 SNPs due to strand ambiguity, 1915103 SNPs remain in intersection after merging trait93 2018/10/02/02:28:26 PM ... Merge of GWAS summary statistics complete. Number of SNPs: 2258212 2018/10/02/02:30:42 PM Using 1915103 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) 2018/10/02/02:30:42 PM Estimating sigma.. 2018/10/03/12:43:54 AM Checking for positive definiteness .. 2018/10/03/12:43:55 AM Sigma hat: [[ 1.052 0.41 0.022 ... 0.039 -0.024 -0.135] [ 0.41 1.048 0.034 ... 0.102 -0.064 -0.048] [ 0.022 0.034 1.043 ... -0.025 0.801 0.059] ... [ 0.039 0.102 -0.025 ... 1.018 -0.006 0.055] [-0.024 -0.064 0.801 ... -0.006 1.045 0.031] [-0.135 -0.048 0.059 ... 0.055 0.031 1.003]] 2018/10/03/12:43:55 AM Mean chi^2 of SNPs used to estimate Omega is low for some SNPsMTAG may not perform well in this situation. 2018/10/03/12:43:56 AM Beginning estimation of Omega ... 2018/10/03/12:50:34 AM Using GMM estimator of Omega .. 2018/10/03/12:51:30 AM Traceback (most recent call last): File "/home/project/mtag/mtag.py", line 1514, in mtag(args) File "/home/project/mtag/mtag.py", line 1307, in mtag args.omega_hat = estimate_omega(args, Zs[not_SA], Ns[not_SA], args.sigma_hat) File "/home/project/mtag/mtag.py", line 715, in estimate_omega return _posDef_adjustment(gmm_omega(Zs,Ns,sigma_LD)) File "/home/project/mtag/mtag.py", line 607, in gmm_omega N_mats = np.sqrt(np.einsum('mp,mq->mpq', Ns,Ns)) MemoryError 2018/10/03/12:51:30 AM Analysis terminated from error at Wed Oct 3 00:51:30 2018 2018/10/03/12:51:30 AM Total time elapsed: 12.0h:51.0m:0.25s

So I suppose some of the matrices used in the calculation of Omega using a GMM becomes incredibly big. This also this happens quite fast (~1 minute distance between "Using GMM estimator of Omega .." and the error message).

Do you have any suggestions on how to proceed from this?

Would it make sense to try calculate an estimate of genetic correlation first to divide all traits into correlation blocks to the run MTAG including only traits within each block? If yes, any tool you would suggest to do that?

Thank you so much for your help!

Cheers, Robbie

Robbie90 avatar Nov 06 '18 07:11 Robbie90

Hi Robbie,

Wow! I don't think I ever envisioned running MTAG on 93 traits simultaneously. I do worry a bit about inflation of the test statistics if you include that many (see Figure 1 of the manuscript). Are these traits that expect to be highly correlated genetically or just a random set of 93?

In terms of where things break down, my guess is that it's in the storage of the Omega and Sigma matrices. If I recall correctly, MTAG stores a TxTxM hypermatrix of the Sigma matrices for each SNP, where T is the number of traits and M is the number of SNPs. That would be a very large object in your case. Unless you want to dig into the guts of the software and recode to so that it calculates each matrix serially rather than all at once, I'm not sure of a great way around this. Your idea of splitting traits into sets of closely related outcomes doesn't sound bad. Especially if there are some logical splits. How many traits can you have before you run into the memory problem?

On Tue, Nov 6, 2018 at 2:11 AM Robbie90 [email protected] wrote:

Hi I'm trying to run MTAG on 93 traits simultaneously...

After Munging and Merging the log was the following:

..... 2018/10/02/02:28:26 PM Dropped 0 SNPs due to strand ambiguity, 1915103 SNPs remain in intersection after merging trait93 2018/10/02/02:28:26 PM ... Merge of GWAS summary statistics complete. Number of SNPs: 2258212 2018/10/02/02:30:42 PM Using 1915103 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) 2018/10/02/02:30:42 PM Estimating sigma.. 2018/10/03/12:43:54 AM Checking for positive definiteness .. 2018/10/03/12:43:55 AM Sigma hat: [[ 1.052 0.41 0.022 ... 0.039 -0.024 -0.135] [ 0.41 1.048 0.034 ... 0.102 -0.064 -0.048] [ 0.022 0.034 1.043 ... -0.025 0.801 0.059] ... [ 0.039 0.102 -0.025 ... 1.018 -0.006 0.055] [-0.024 -0.064 0.801 ... -0.006 1.045 0.031] [-0.135 -0.048 0.059 ... 0.055 0.031 1.003]] 2018/10/03/12:43:55 AM Mean chi^2 of SNPs used to estimate Omega is low for some SNPsMTAG may not perform well in this situation. 2018/10/03/12:43:56 AM Beginning estimation of Omega ... 2018/10/03/12:50:34 AM Using GMM estimator of Omega .. 2018/10/03/12:51:30 AM Traceback (most recent call last): File "/home/project/mtag/mtag.py", line 1514, in mtag(args) File "/home/project/mtag/mtag.py", line 1307, in mtag args.omega_hat = estimate_omega(args, Zs[not_SA], Ns[not_SA], args.sigma_hat) File "/home/project/mtag/mtag.py", line 715, in estimate_omega return _posDef_adjustment(gmm_omega(Zs,Ns,sigma_LD)) File "/home/project/mtag/mtag.py", line 607, in gmm_omega N_mats = np.sqrt(np.einsum('mp,mq->mpq', Ns,Ns)) MemoryError 2018/10/03/12:51:30 AM Analysis terminated from error at Wed Oct 3 00:51:30 2018 2018/10/03/12:51:30 AM Total time elapsed: 12.0h:51.0m:0.25s

So I suppose some of the matrices used in the calculation of Omega using a GMM becomes incredibly big. This also this happens quite fast (~1 minute distance between "Using GMM estimator of Omega .." and the error message).

Do you have any suggestions on how to proceed from this?

Would it make sense to try calculate an estimate of genetic correlation first to divide all traits into correlation blocks to the run MTAG including only traits within each block? If yes, any tool you would suggest to do that?

Thank you so much for your help!

Cheers, Robbie

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/omeed-maghzian/mtag/issues/48, or mute the thread https://github.com/notifications/unsubscribe-auth/AUNA9cGSL0wZTyJwVGnjNWFj-cY7PWN2ks5usTY1gaJpZM4YP3zK .

paturley avatar Nov 07 '18 15:11 paturley

Hi Patrick, I don't think I will be the only one trying something on this scale, especially now in the UK biobank era :) I also though that might have created a problem especially since the average chi for each study is not that big. However I still wanted to give it a go. I expected them to be all correlated, all trait should share at least some genetic with another trait.

Not sure how may I can try before crashing it. I'll try to maybe run a set a parallel analyses with say 20, 40, 60, and 80 and let you know which one stops first.

Cheers, Robbie

Robbie90 avatar Nov 10 '18 01:11 Robbie90

Hi Patrick, I don't think I will be the only one trying something on this scale, especially now in the UK biobank era :) I also though that might have created a problem especially since the average chi for each study is not that big. However I still wanted to give it a go. I expected them to be all correlated, all trait should share at least some genetic with another trait.

Not sure how may I can try before crashing it. I'll try to maybe run a set a parallel analyses with say 20, 40, 60, and 80 and let you know which one stops first.

Cheers, Robbie

Hi Robbie, I also meet the Memory Error when I run with 78 traits. I just want to know if you solve this problem and how do you solve this problem. Thanks.

Cheers, Yan

chenyan53535 avatar Apr 08 '19 10:04 chenyan53535