mtag icon indicating copy to clipboard operation
mtag copied to clipboard

IOError: --maf_min

Open AI-10 opened this issue 3 years ago • 8 comments

Hi,all

What is the relationship between freq and MAF in MTAG analysis?

It seems that what MTAG analysis needs is freq (The effect allele frequency (that is, the frequency of a1), used to filter rare variants and weight convert MTAG results to unstandardized effect sizes.), but the first step of MTAG analysis is to read GWAS summary statistics and filter the SNPs by minor allele frequency (MAF) >= 0.01. So are freq and MAF considered the same in MTAG analysis?

When I tried to perform MTAG analysis, the following error was reported:

raise IOError("--maf_min is specified but maf column is not present in sumstats {}".format(p+1))
IOError: --maf_min is specified but maf column is not present in sumstats 1

This error was reported because my GWAS summary data is missing the freq column, Can I replace freq with MAF from a reference panel

AI-10 avatar Jun 28 '21 08:06 AI-10

The only case that allele frequency gets used is to transform the summary statistics into standardized or unstandardized units, so using either frequency or MAF should work fine. It also should be fine if they come from a reference panel as long as the reference panel is a close enough match to your data.

Best, Patrick

On Mon, Jun 28, 2021 at 4:12 AM 巩伟明-SDU @.***> wrote:

Hi,all

What is the relationship between freq and MAF in MTAG analysis?

It seems that what MTAG analysis needs is freq (The effect allele frequency (that is, the frequency of a1), used to filter rare variants and weight convert MTAG results to unstandardized effect sizes.), but the first step of MTAG analysis is to read GWAS summary statistics and filter the SNPs by minor allele frequency (MAF) >= 0.01. So are freq and MAF considered the same in MTAG analysis?

When I tried to perform MTAG analysis, the following error was reported:

raise IOError("--maf_min is specified but maf column is not present in sumstats {}".format(p+1)) IOError: --maf_min is specified but maf column is not present in sumstats 1

This error was reported because my GWAS summary data is missing the freq column, Can I replace freq with MAF from a reference panel

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/135, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5PPBFQVJYZTNQVTHEDTVAVGZANCNFSM47NODQOQ .

paturley avatar Jun 28 '21 13:06 paturley

Thanks, Patrick

With your help, I have solved the problem and got the MTAG analysis results, but I have another question about the MTAG analysis results, Although this question has been mentioned by previous issues, https://github.com/JonJala/mtag/issues/39#issue-351563128,I still want to know your opinion on this result.

Here is the result of MTAG analysis Trait # SNPs used N (max) N (mean) GWAS mean chi^2 MTAG mean chi^2 GWAS equiv. (max) N 1 .../MTAG/Trait1 5402336 7100 7100 1.009 1.148 116755
2 .../MTAG/Trait2 5402336 501023 501023 1.143 1.153 534965

Estimated Omega: [[1.421e-06 5.547e-07] [5.547e-07 2.272e-07]]

(Correlation): [[1. 0.976] [0.976 1. ]]

Estimated Sigma: [[1.005 0.014] [0.014 0.988]]

(Correlation): [[1. 0.014] [0.014 1. ]]

As the result shows, the sample size of trait 1 is only 7K, while the sample size of trait 2 is 50K, and the chi^2 of trait 1 is lower than 1.02, so I used the --force option in MTAG analysis. My research focused on trait 1, so I want to know if this result is credible.

Many thanks,

AI-10 avatar Jun 28 '21 15:06 AI-10

It's hard to say. With such a high genetic correlation, I think it's likely fine, but have you done maxFDR calculations yet?

On Mon, Jun 28, 2021 at 11:43 AM 巩伟明-SDU @.***> wrote:

Thanks, Patrick

With your help, I have solved the problem and got the MTAG analysis results, but I have another question about the MTAG analysis results, Although this question has been mentioned by previous issues, #39 (comment) https://github.com/JonJala/mtag/issues/39#issue-351563128,I still want to know your opinion on this result.

Here is the result of MTAG analysis Trait # SNPs used N (max) N (mean) GWAS mean chi^2 MTAG mean chi^2 GWAS equiv. (max) N 1 .../MTAG/Trait1 5402336 7100 7100 1.009 1.148 116755 2 .../MTAG/Trait2 5402336 501023 501023 1.143 1.153 534965

Estimated Omega: [[1.421e-06 5.547e-07] [5.547e-07 2.272e-07]]

(Correlation): [[1. 0.976] [0.976 1. ]]

Estimated Sigma: [[1.005 0.014] [0.014 0.988]]

(Correlation): [[1. 0.014] [0.014 1. ]]

As the result shows, the sample size of trait 1 is only 7K, while the sample size of trait 2 is 50K, and the chi^2 of trait 1 is lower than 1.002, so I used the --force option in MTAG analysis. My research focused on trait 1, so I want to know if this result is credible.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/135#issuecomment-869793700, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5IENNLVEWU44IKUMLTTVCKBTANCNFSM47NODQOQ .

paturley avatar Jun 28 '21 15:06 paturley

Yes, I have just done maxfdr calculations with the following command

Here is the result: LOG: 2021/06/28/09:05:44 AM Beginning maxFDR calculations. Depending on the number of grid points specified, this might take some time... 2021/06/28/09:05:44 AM T=2 2021/06/28/09:05:44 AM Number of gridpoints to search: 10 2021/06/28/09:05:44 AM Performing grid search using 1 cores. 2021/06/28/09:05:44 AM Grid search: 10.0 percent finished for . Time: 0.001 min 2021/06/28/09:05:44 AM Grid search: 20.0 percent finished for . Time: 0.002 min 2021/06/28/09:05:44 AM Grid search: 30.0 percent finished for . Time: 0.002 min 2021/06/28/09:05:44 AM Grid search: 40.0 percent finished for . Time: 0.002 min 2021/06/28/09:05:44 AM Grid search: 50.0 percent finished for . Time: 0.003 min 2021/06/28/09:05:44 AM Grid search: 60.0 percent finished for . Time: 0.003 min 2021/06/28/09:05:44 AM Grid search: 70.0 percent finished for . Time: 0.003 min 2021/06/28/09:05:44 AM Grid search: 80.0 percent finished for . Time: 0.003 min 2021/06/28/09:05:44 AM Grid search: 90.0 percent finished for . Time: 0.004 min 2021/06/28/09:05:44 AM Grid search: 100.0 percent finished for . Time: 0.004 min 2021/06/28/09:05:44 AM Saved calculations of fdr over grid points in ./MTAG_fdr_mat.txt 2021/06/28/09:05:44 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2021/06/28/09:05:44 AM grid point indices for max FDR for each trait: [5 5] 2021/06/28/09:05:44 AM Maximum FDR 2021/06/28/09:05:44 AM Max FDR of Trait 1: 0.0580646769506 at probs = [0.5 0. 0. 0.5] 2021/06/28/09:05:44 AM Max FDR of Trait 2: 0.0586834661076 at probs = [0.5 0. 0. 0.5] 2021/06/28/09:05:44 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2021/06/28/09:05:44 AM Completed FDR calculations.

MTAG_fdr_mat: 0.000000000000000000e+00 0.000000000000000000e+00 2.027102050889234564e-02 2.040604456678559928e-02 3.724722277947830479e-02 3.751802964846761429e-02 5.001644805317329717e-02 5.041997992086438146e-02 5.741357429073161434e-02 5.793862398909319089e-02 5.806467695058817324e-02 5.868346610757765869e-02 5.071930707942827465e-02 5.137262321522154812e-02 3.539335191114843793e-02 3.597203875017745273e-02 1.604511704008450626e-02 1.640155411913529412e-02 2.561442979913771366e-03 2.647780272021536124e-03

AI-10 avatar Jun 28 '21 16:06 AI-10

It looks like you didn't paste in the command you used. Assuming you did it correctly though, the implication is that, as long as the true effect size distribution is characterized by a spike-and-slab distribution, the false discovery rate of MTAG in your setting is less than 5.8%.

On Mon, Jun 28, 2021 at 12:09 PM 巩伟明-SDU @.***> wrote:

Yes, I have just done maxfdr calculations with the following command

Here is the result: LOG: 2021/06/28/09:05:44 AM Beginning maxFDR calculations. Depending on the number of grid points specified, this might take some time... 2021/06/28/09:05:44 AM T=2 2021/06/28/09:05:44 AM Number of gridpoints to search: 10 2021/06/28/09:05:44 AM Performing grid search using 1 cores. 2021/06/28/09:05:44 AM Grid search: 10.0 percent finished for . Time: 0.001 min 2021/06/28/09:05:44 AM Grid search: 20.0 percent finished for . Time: 0.002 min 2021/06/28/09:05:44 AM Grid search: 30.0 percent finished for . Time: 0.002 min 2021/06/28/09:05:44 AM Grid search: 40.0 percent finished for . Time: 0.002 min 2021/06/28/09:05:44 AM Grid search: 50.0 percent finished for . Time: 0.003 min 2021/06/28/09:05:44 AM Grid search: 60.0 percent finished for . Time: 0.003 min 2021/06/28/09:05:44 AM Grid search: 70.0 percent finished for . Time: 0.003 min 2021/06/28/09:05:44 AM Grid search: 80.0 percent finished for . Time: 0.003 min 2021/06/28/09:05:44 AM Grid search: 90.0 percent finished for . Time: 0.004 min 2021/06/28/09:05:44 AM Grid search: 100.0 percent finished for . Time: 0.004 min 2021/06/28/09:05:44 AM Saved calculations of fdr over grid points in ./MTAG_fdr_mat.txt 2021/06/28/09:05:44 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2021/06/28/09:05:44 AM grid point indices for max FDR for each trait: [5 5] 2021/06/28/09:05:44 AM Maximum FDR 2021/06/28/09:05:44 AM Max FDR of Trait 1: 0.0580646769506 at probs = [0.5 0. 0. 0.5] 2021/06/28/09:05:44 AM Max FDR of Trait 2: 0.0586834661076 at probs = [0.5 0. 0. 0.5] 2021/06/28/09:05:44 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2021/06/28/09:05:44 AM Completed FDR calculations.

MTAG_fdr_mat: 0.000000000000000000e+00 0.000000000000000000e+00 2.027102050889234564e-02 2.040604456678559928e-02 3.724722277947830479e-02 3.751802964846761429e-02 5.001644805317329717e-02 5.041997992086438146e-02 5.741357429073161434e-02 5.793862398909319089e-02 5.806467695058817324e-02 5.868346610757765869e-02 5.071930707942827465e-02 5.137262321522154812e-02 3.539335191114843793e-02 3.597203875017745273e-02 1.604511704008450626e-02 1.640155411913529412e-02 2.561442979913771366e-03 2.647780272021536124e-03

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/135#issuecomment-869814222, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5IMGOJRLWMU35CNUQTTVCNCPANCNFSM47NODQOQ .

paturley avatar Jun 28 '21 16:06 paturley

Sorry,

Here is the command I used in the maxfdr calculations: Python .. /MTAG/MTAG.Py --skip_mtag --out ./MTAG

Many thanks, weiming

AI-10 avatar Jun 28 '21 16:06 AI-10

I would probably use more gridpoints than 10 if you have the computation power. If you can get away with 100, that's great. 50 is probably fine.

In terms of whether the results are reliable, it depends on whether you think an FDR of almost 6% is reliable. it kinda comes down to the application and what you and your reviewers think.

On Mon, Jun 28, 2021 at 12:23 PM 巩伟明-SDU @.***> wrote:

Sorry,

Here is the command I used in the maxfdr calculations: Python .. /MTAG/MTAG.Py --skip_mtag --out ./MTAG

So the results look reliable?

Many thanks, weiming

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/135#issuecomment-869825548, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5IN7VTJO7DBNGT2V3LTVCOZRANCNFSM47NODQOQ .

paturley avatar Jun 28 '21 17:06 paturley

Thanks, Patrick

I will follow your advice to have a try.

Best wishes, weiming

AI-10 avatar Jun 28 '21 17:06 AI-10