mtag
mtag copied to clipboard
results quite different from plink results
I am analyzing several related sub-phenotypes for one gwas cohort. There is a lot of sample overlap between these sub-phenotypes. I first conducted gwas analyses using plink and then did mtag analysis based on plink summary statistics of these sub-phenotypes. At least for the top significant loci, the mtag results were quite different from plink results. For example, snps that reached genome-wide significance in mtag result of one sub-phenotype were just at pval=10-3 or 10-2 level in plink result of each phenotype; and those snps close to genome-wide significant in plink results were also just at pval=10-3 level in mtag. What might go wrong in my analyses? Thank you very much!
Hello! Can you post the log file from your analysis? It's hard to diagnose what may be the problem without it.
2018/03/22/09:19:30 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> <> MTAG: Multi-trait Analysis of GWAS <> Version: 1.0.7 <> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley <> Harvard University Department of Economics / Broad Institute of MIT and Harvard <> GNU General Public License v3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> Note: It is recommended to run your own QC on the input before using this program. <> Software-related correspondence: [email protected] <> All other correspondence: [email protected] <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Calling ./mtag.py
test.log...skipping...
2018/03/22/09:19:30 AM
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<>
<> MTAG: Multi-trait Analysis of GWAS
<> Version: 1.0.7
<> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley
<> Harvard University Department of Economics / Broad Institute of MIT and Harvard
<> GNU General Public License v3
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<> Note: It is recommended to run your own QC on the input before using this program.
<> Software-related correspondence: [email protected]
<> All other correspondence: [email protected]
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Calling ./mtag.py
--n-min 0.0
--sumstats test_cc_CO2cohorts_testall_formtagin.txt,test_cc_CO2cohorts_testA_formtagin.txt,test_cc_CO2cohorts_testE_formtagin.txt,test_cc_CO2cohorts_rtestA_formtagin.txt,test_cc_3cohorts_operation_formtagin.txt
--out ./test_cc_CO2cohorts_mtag
2018/03/22/09:19:30 AM Beginning MTAG analysis... 2018/03/22/09:19:41 AM Read in Trait 1 summary statistics (7085066 SNPs) from test_cc_CO2cohorts_testall_formtagin.txt ... 2018/03/22/09:19:41 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2018/03/22/09:19:41 AM Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2018/03/22/09:19:41 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2018/03/22/09:19:41 AM Interpreting column names as follows: 2018/03/22/09:19:41 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: Allele 1, interpreted as ref allele for signed sumstat. pval: p-Value a2: Allele 2, interpreted as non-ref allele for signed sumstat. z: Directional summary statistic as specified by --signed-sumstats.
2018/03/22/09:19:42 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2018/03/22/09:19:52 AM Read 7085066 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 7085066 SNPs remain. 2018/03/22/09:19:58 AM Removed 3857 SNPs with duplicated rs numbers (7081209 SNPs remain). 2018/03/22/09:19:59 AM Removed 0 SNPs with N < 8124.66666667 (7081209 SNPs remain). 2018/03/22/09:23:39 AM Median value of SIGNED_SUMSTATS was -0.001148, which seems sensible. 2018/03/22/09:23:40 AM Dropping snps with null values 2018/03/22/09:23:40 AM Metadata: 2018/03/22/09:23:40 AM Mean chi^2 = 1.003 2018/03/22/09:23:40 AM WARNING: mean chi^2 may be too small. 2018/03/22/09:23:40 AM Lambda GC = 1.009 2018/03/22/09:23:40 AM Max chi^2 = 25.447 2018/03/22/09:23:40 AM 0 Genome-wide significant SNPs (some may have been removed by filtering). 2018/03/22/09:23:40 AM Conversion finished at Thu Mar 22 09:23:40 2018 2018/03/22/09:23:40 AM Total time elapsed: 3.0m:59.18s 2018/03/22/09:23:51 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2018/03/22/09:23:51 AM Munging of Trait 1 complete. SNPs remaining: 7085066 2018/03/22/09:23:51 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/03/22/09:24:03 AM Trait 1: Dropped 3857 SNPs for duplicate values in the "snp_name" column 2018/03/22/09:24:13 AM Read in Trait 2 summary statistics (7082727 SNPs) from test_cc_CO2cohorts_testA_formtagin.txt ... 2018/03/22/09:24:13 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2018/03/22/09:24:13 AM Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2018/03/22/09:24:13 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2018/03/22/09:24:13 AM Interpreting column names as follows: 2018/03/22/09:24:13 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: Allele 1, interpreted as ref allele for signed sumstat. pval: p-Value a2: Allele 2, interpreted as non-ref allele for signed sumstat. z: Directional summary statistic as specified by --signed-sumstats.
2018/03/22/09:24:13 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2018/03/22/09:24:28 AM Read 7082727 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 7082727 SNPs remain. 2018/03/22/09:24:37 AM Removed 3854 SNPs with duplicated rs numbers (7078873 SNPs remain). 2018/03/22/09:24:39 AM Removed 0 SNPs with N < 7299.33333333 (7078873 SNPs remain). 2018/03/22/09:28:19 AM Median value of SIGNED_SUMSTATS was -0.006717, which seems sensible. 2018/03/22/09:28:20 AM Dropping snps with null values 2018/03/22/09:28:21 AM Metadata: 2018/03/22/09:28:21 AM Mean chi^2 = 0.997 2018/03/22/09:28:21 AM WARNING: mean chi^2 may be too small. 2018/03/22/09:28:22 AM Lambda GC = 1.001 2018/03/22/09:28:22 AM Max chi^2 = 23.608 2018/03/22/09:28:22 AM 0 Genome-wide significant SNPs (some may have been removed by filtering). 2018/03/22/09:28:22 AM Conversion finished at Thu Mar 22 09:28:22 2018 2018/03/22/09:28:22 AM Total time elapsed: 4.0m:9.34s 2018/03/22/09:28:40 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2018/03/22/09:28:40 AM Munging of Trait 2 complete. SNPs remaining: 7082727 2018/03/22/09:28:40 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/03/22/09:28:55 AM Trait 2: Dropped 3854 SNPs for duplicate values in the "snp_name" column 2018/03/22/09:29:07 AM Read in Trait 3 summary statistics (7070252 SNPs) from test_cc_CO2cohorts_testE_formtagin.txt ... 2018/03/22/09:29:07 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2018/03/22/09:29:07 AM Munging Trait 3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2018/03/22/09:29:07 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2018/03/22/09:29:07 AM Interpreting column names as follows: 2018/03/22/09:29:07 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: Allele 1, interpreted as ref allele for signed sumstat. pval: p-Value a2: Allele 2, interpreted as non-ref allele for signed sumstat. z: Directional summary statistic as specified by --signed-sumstats.
2018/03/22/09:29:08 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2018/03/22/09:29:25 AM Read 7070252 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 7070252 SNPs remain. 2018/03/22/09:29:36 AM Removed 3839 SNPs with duplicated rs numbers (7066413 SNPs remain). 2018/03/22/09:29:38 AM Removed 0 SNPs with N < 7140.66666667 (7066413 SNPs remain). 2018/03/22/09:33:20 AM Median value of SIGNED_SUMSTATS was -0.003525, which seems sensible. 2018/03/22/09:33:22 AM Dropping snps with null values 2018/03/22/09:33:23 AM Metadata: 2018/03/22/09:33:24 AM Mean chi^2 = 1.005 2018/03/22/09:33:24 AM WARNING: mean chi^2 may be too small. 2018/03/22/09:33:24 AM Lambda GC = 1.015 2018/03/22/09:33:24 AM Max chi^2 = 23.417 2018/03/22/09:33:24 AM 0 Genome-wide significant SNPs (some may have been removed by filtering). 2018/03/22/09:33:24 AM Conversion finished at Thu Mar 22 09:33:24 2018 2018/03/22/09:33:24 AM Total time elapsed: 4.0m:17.41s 2018/03/22/09:33:42 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2018/03/22/09:33:42 AM Munging of Trait 3 complete. SNPs remaining: 7070252 2018/03/22/09:33:42 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/03/22/09:33:57 AM Trait 3: Dropped 3839 SNPs for duplicate values in the "snp_name" column 2018/03/22/09:34:09 AM Read in Trait 4 summary statistics (7080087 SNPs) from test_cc_CO2cohorts_rtestA_formtagin.txt ... 2018/03/22/09:34:09 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2018/03/22/09:34:09 AM Munging Trait 4 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2018/03/22/09:34:09 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2018/03/22/09:34:09 AM Interpreting column names as follows: 2018/03/22/09:34:09 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: Allele 1, interpreted as ref allele for signed sumstat. pval: p-Value a2: Allele 2, interpreted as non-ref allele for signed sumstat. z: Directional summary statistic as specified by --signed-sumstats.
2018/03/22/09:34:10 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2018/03/22/09:34:27 AM Read 7080087 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 7080087 SNPs remain. 2018/03/22/09:34:37 AM Removed 3854 SNPs with duplicated rs numbers (7076233 SNPs remain). 2018/03/22/09:34:40 AM Removed 0 SNPs with N < 6470.66666667 (7076233 SNPs remain). 2018/03/22/09:38:23 AM Median value of SIGNED_SUMSTATS was -0.002068, which seems sensible. 2018/03/22/09:38:24 AM Dropping snps with null values 2018/03/22/09:38:26 AM Metadata: 2018/03/22/09:38:26 AM Mean chi^2 = 0.983 2018/03/22/09:38:26 AM WARNING: mean chi^2 may be too small. 2018/03/22/09:38:27 AM Lambda GC = 0.982 2018/03/22/09:38:27 AM Max chi^2 = 28.734 2018/03/22/09:38:27 AM 0 Genome-wide significant SNPs (some may have been removed by filtering). 2018/03/22/09:38:27 AM Conversion finished at Thu Mar 22 09:38:27 2018 2018/03/22/09:38:27 AM Total time elapsed: 4.0m:17.78s 2018/03/22/09:38:46 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2018/03/22/09:38:46 AM Munging of Trait 4 complete. SNPs remaining: 7080087 2018/03/22/09:38:46 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/03/22/09:39:04 AM Trait 4: Dropped 3854 SNPs for duplicate values in the "snp_name" column 2018/03/22/09:39:14 AM Read in Trait 5 summary statistics (5609740 SNPs) from test_cc_3cohorts_operation_formtagin.txt ... 2018/03/22/09:39:14 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2018/03/22/09:39:14 AM Munging Trait 5 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2018/03/22/09:39:14 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2018/03/22/09:39:14 AM Interpreting column names as follows: 2018/03/22/09:39:14 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: Allele 1, interpreted as ref allele for signed sumstat. pval: p-Value a2: Allele 2, interpreted as non-ref allele for signed sumstat. z: Directional summary statistic as specified by --signed-sumstats.
2018/03/22/09:39:14 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2018/03/22/09:39:28 AM Read 5609740 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 5609740 SNPs remain. 2018/03/22/09:39:35 AM Removed 2425 SNPs with duplicated rs numbers (5607315 SNPs remain). 2018/03/22/09:39:37 AM Removed 0 SNPs with N < 7140.66666667 (5607315 SNPs remain). 2018/03/22/09:42:33 AM Median value of SIGNED_SUMSTATS was -0.002329, which seems sensible. 2018/03/22/09:42:34 AM Dropping snps with null values 2018/03/22/09:42:35 AM Metadata: 2018/03/22/09:42:35 AM Mean chi^2 = 1.01 2018/03/22/09:42:35 AM WARNING: mean chi^2 may be too small. 2018/03/22/09:42:36 AM Lambda GC = 1.012 2018/03/22/09:42:36 AM Max chi^2 = 24.625 2018/03/22/09:42:36 AM 0 Genome-wide significant SNPs (some may have been removed by filtering). 2018/03/22/09:42:36 AM Conversion finished at Thu Mar 22 09:42:36 2018 2018/03/22/09:42:36 AM Total time elapsed: 3.0m:22.24s 2018/03/22/09:42:50 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2018/03/22/09:42:50 AM Munging of Trait 5 complete. SNPs remaining: 5609740 2018/03/22/09:42:50 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/03/22/09:43:02 AM Trait 5: Dropped 2425 SNPs for duplicate values in the "snp_name" column 2018/03/22/09:43:23 AM Trait 2 summary statistics: 7077884 SNPs remaining merging with previous traits. 2018/03/22/09:44:00 AM Trait 3 summary statistics: 7062899 SNPs remaining merging with previous traits. 2018/03/22/09:44:43 AM Trait 4 summary statistics: 7058639 SNPs remaining merging with previous traits. 2018/03/22/09:45:31 AM Trait 5 summary statistics: 5591946 SNPs remaining merging with previous traits. 2018/03/22/09:45:46 AM Dropped 57 SNPs due to inconsistent allele pairs from phenotype 5. 5591889 SNPs remain. 2018/03/22/09:45:59 AM Flipped the signs of of 11 SNPs to make them consistent with the effect allele orderings of the first trait. 2018/03/22/09:46:02 AM ... Merge of GWAS summary statistics complete. Number of SNPs: 5591889 2018/03/22/09:46:16 AM Using 4742913 SNPs to estimate Omega (848976 SNPs excluded due to strand ambiguity) 2018/03/22/09:46:16 AM Estimating sigma.. 2018/03/22/09:50:47 AM Checking for positive definiteness .. 2018/03/22/09:50:47 AM Sigma hat: [[1.006 0.805 0.76 0.417 0.523] [0.805 0.992 0.541 0.519 0.399] [0.76 0.541 1.001 0.363 0.625] [0.417 0.519 0.363 0.979 0.243] [0.523 0.399 0.625 0.243 0.989]] 2018/03/22/09:50:47 AM Mean chi^2 of SNPs used to estimate Omega is low for some SNPsMTAG may not perform well in this situation. 2018/03/22/09:50:47 AM Beginning estimation of Omega ... 2018/03/22/09:50:52 AM Using GMM estimator of Omega .. 2018/03/22/09:51:01 AM Checking for positive definiteness .. 2018/03/22/09:51:01 AM matrix is not positive definite, performing adjustment.. 2018/03/22/09:51:01 AM Warning: max number of iterations reached in adjustment procedure. Sigma matrix used is still non-positive-definite. 2018/03/22/09:51:01 AM Completed estimation of Omega ... 2018/03/22/09:51:01 AM Beginning MTAG calculations... 2018/03/22/09:56:40 AM ... Completed MTAG calculations. 2018/03/22/09:56:41 AM Writing Phenotype 1 to file ... 2018/03/22/09:57:36 AM Writing Phenotype 2 to file ... 2018/03/22/09:58:31 AM Writing Phenotype 3 to file ... 2018/03/22/09:59:26 AM Writing Phenotype 4 to file ... 2018/03/22/10:00:22 AM Writing Phenotype 5 to file ...
It looks to me like your sample sizes may be too small to be well-suited for MTAG. In general, I recommend that the mean chi2 statistic for the GWAS summary statistics should be at least 1.1. In your case, your mean chi2 stat is less than one. As a result, the estimate of Omega is not positive definite, which makes the theory fall apart.
Really the software should produce more obvious errors (and possibly not even generate summary statistics) in situations where the mean chi2 is less than one. I'll try to incorporate that into future versions. Can you make a note of this @huilisabrina ?
@paturley Yes I'll add this in the next round of revision, which will happen soon!
Hi @jineexia , I've added the check of chi2 in the updated version of MTAG. Please re-pull the master branch for future runs. If your question has been resolved, feel free to close the issue!
Thanks, Hui