mtag
mtag copied to clipboard
ValueError: cannot reindex from a duplicate axis
Not sure if this is on my end or if this (perhaps?) has to do with the new n_value
or p_name
options. This is my first time not adding an N column or renaming the P column from the BOLT-LMM output. After the 3 trait files get munged, emitting mean chi^2, GC estimates, etc, the script panics due to ValueError: cannot reindex from a duplicate axis.
Does any clear issue stand out? If not, I can paste the full log and start manipulating columns to try to narrow down whether this is actually related to the new n_value
or p_name
settings.
Error:
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging of Trait 3 complete. SNPs remaining: 8647669
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Trait 3: Dropped 9225 SNPs for duplicate values in the "snp_name" column
Dropped 1351905 SNPs due to strand ambiguity, 7286539 SNPs remain in intersection after merging trait1
Dropped 0 SNPs due to strand ambiguity, 7286539 SNPs remain in intersection after merging trait2
Dropped 0 SNPs due to strand ambiguity, 7286539 SNPs remain in intersection after merging trait3
... Merge of GWAS summary statistics complete. Number of SNPs: 7286539
cannot reindex from a duplicate axis
Traceback (most recent call last):
File "mtag.py", line 1557, in <module>
mtag(args)
File "mtag.py", line 1330, in mtag
Zs , Ns ,Fs, res_temp, DATA, N_raw = extract_gwas_sumstats(DATA,args,list(np.arange(args.P)))
File "mtag.py", line 526, in extract_gwas_sumstats
Ns = DATA.filter(items=n_cols).as_matrix()
File "/Users/jamesp/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/generic.py", line 3900, in filter
**{name: [r for r in items if r in labels]})
File "/Users/jamesp/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/util/_decorators.py", line 187, in wrapper
return func(*args, **kwargs)
File "/Users/jamesp/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/frame.py", line 3566, in reindex
return super(DataFrame, self).reindex(**kwargs)
File "/Users/jamesp/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/generic.py", line 3689, in reindex
fill_value, copy).__finalize__(self)
File "/Users/jamesp/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/frame.py", line 3496, in _reindex_axes
fill_value, limit, tolerance)
File "/Users/jamesp/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/frame.py", line 3521, in _reindex_columns
allow_dups=False)
File "/Users/jamesp/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/generic.py", line 3810, in _reindex_with_indexers
copy=copy)
File "/Users/jamesp/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/internals.py", line 4414, in reindex_indexer
self.axes[axis]._can_reindex(indexer)
File "/Users/jamesp/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 3576, in _can_reindex
raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis
Analysis terminated from error at Thu Nov 29 20:44:11 2018
Total time elapsed: 12.0m:16.14s
Command:
python mtag.py \
--sumstats trait1,trait2, trait3 \
--out mtag.out \
--n_value 100,100,100 \
--p_name P_BOLT_LMM \
--snp_name SNP \
--chr_name CHR \
--bpos_name BP \
--beta_name BETA \
--se_name SE \
--a1_name ALLELE1 \
--a2_name ALLELE0 \
--eaf_name A1FREQ \
--n_min 0.0 \
--info_min 0.3 \
--cores 1 \
--use_beta_se \
--n_approx \
--stream_stdout \
--fdr
Hi @carbocation ,
Do you still have the N column present in the input sumstats even though you're using --n_value
?
I will tweak the codes so that it prioritizes the --n_value
flag and ignores the existing N column in the input. But I just wanted to make sure this is indeed the issue. Let me know if removing the N column solves the problem (temporarily).
Thanks, Hui
In the above code, the input file had no N column (it was pure BOLT-LMM output).
Could you quickly check and send the first few lines of each sumstats you're using (just via head
in bash)?
Thanks, Hui
Sure thing. Each file:
1
SNP CHR BP GENPOS ALLELE1 ALLELE0 A1FREQ INFO CHISQ_LINREG P_LINREG BETA SE CHISQ_BOLT_LMM_INF P_BOLT_LMM_INF CHISQ_BOLT_LMM P_BOLT_LMM
rs367896724 1 10177 0 A AC 0.599484 0.467935 0.0815533 7.8E-01 0.146417 0.417527 0.122974 7.3E-01 0.187453 6.7E-01
rs201106462 1 10352 0 T TA 0.607518 0.447895 2.38133 1.2E-01 0.660126 0.430198 2.35459 1.2E-01 2.37463 1.2E-01
1:10616_CCGCCGTTGCAAAGGCGCGCCG_C 1 10616 0 CCGCCGTTGCAAAGGCGCGCCG C 0.00537706 0.468098 1.22665 2.7E-01 3.37521 2.94865 1.31025 2.5E-01 1.29 2.6E-01
2
SNP CHR BP GENPOS ALLELE1 ALLELE0 A1FREQ INFO CHISQ_LINREG P_LINREG BETA SE CHISQ_BOLT_LMM_INF P_BOLT_LMM_INF CHISQ_BOLT_LMM P_BOLT_LMM
rs367896724 1 10177 0 A AC 0.599484 0.467935 0.0661935 8.0E-01 -0.0153067 0.245892 0.00387504 9.5E-01 0.0001763 9.9E-01
rs201106462 1 10352 0 T TA 0.607518 0.447895 3.64409 5.6E-02 0.516429 0.253354 4.15493 4.2E-02 3.98652 4.6E-02
1:10616_CCGCCGTTGCAAAGGCGCGCCG_C 1 10616 0 CCGCCGTTGCAAAGGCGCGCCG C 0.00537706 0.468098 0.186749 6.7E-01 1.13393 1.73653 0.426389 5.1E-01 0.389551 5.3E-01
3
SNP CHR BP GENPOS ALLELE1 ALLELE0 A1FREQ INFO CHISQ_LINREG P_LINREG BETA SE CHISQ_BOLT_LMM_INF P_BOLT_LMM_INF CHISQ_BOLT_LMM P_BOLT_LMM
rs367896724 1 10177 0 A AC 0.599484 0.467935 0.641061 4.2E-01 0.000588464 0.000945475 0.387382 5.3E-01 0.379562 5.4E-01
rs201106462 1 10352 0 T TA 0.607518 0.447895 0.663884 4.2E-01 -0.000941531 0.000974169 0.934116 3.3E-01 0.907045 3.4E-01
1:10616_CCGCCGTTGCAAAGGCGCGCCG_C 1 10616 0 CCGCCGTTGCAAAGGCGCGCCG C 0.00537706 0.468098 0.20568 6.5E-01 0.00148604 0.00667712 0.0495318.2E-01 0.0552681 8.1E-01
Thanks! Hmm I still can't replicate your problem. I'll track down this problem eventually, but if this is urgent, could you try removing all the unused columns in the input (i.e. P_BOLT_LMM_INF
, CHISQ_BOLT_LMM_INF
, CHISQ_BOLT_LMM
, CHISQ_LINREG
, P_LINREG
). I'm sure this will work. The --p_name
and --n_value
are functioning fine as long as the inputs are well formatted. I'll let you know once I've tried more things.
Thanks for your help! For now, I've just gone back to the old approach which still works fine (adding an N column, renaming the P_BOLT_LMM column to P, and discarding irrelevant fields).
I seem to recall an issue previously where the function sometimes mis-identifies which column is which when they are not specified and when there are extraneous columns in the file. (e.g., if there is a column that has an N in it but is not the sample size column.) I don't remember if we ever resolved this.
@paturley This hasn't come up during my time with mtag
... but I think I just replicated this error that @carbocation is seeing. The problem has to do with the presence of multiple p-value columns - i.e. in addition to the target P_BOLT_LMM
, there is also P_BOLT_LMM_INF
in the input, so both of them are identified and the software is confused which one to use. I will try to see how to resolve this. I think this didn't pop up sooner because usually the input files contain more distinguishable column names. A general theme of enhancement I should consider from now on is how to streamline bolt
and mtag
better!
Hello,
Need help for a similar problem.
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 2 complete. SNPs remaining: 921802 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Dropped 4 SNPs due to strand ambiguity, 933905 SNPs remain in intersection after merging trait1
Dropped 0 SNPs due to strand ambiguity, 908163 SNPs remain in intersection after merging trait2
... Merge of GWAS summary statistics complete. Number of SNPs: 908163
cannot reindex from a duplicate axis
Traceback (most recent call last):
File "/shares/compbio/Group-Wray/restuadi/project/Multitraits_prediction/Jan2019run/software/mtag/mtag.py", line 1557, in
As for my sumstats looks fine (I've formatted it according to mtag Github page) sumstats 1 snpid chr bpos a1 a2 freq z pval n rs3094315 1 752566 A G 0.8313 -0.912087912088 0.3617 80610 rs3131972 1 752721 A G 0.1712 1.01470588235 0.3102 80610 rs3131969 1 754182 A G 0.1456 1.0635451505 0.2875 80610 rs1048488 1 760912 T C 0.8237 -0.741007194245 0.4587 80610
sumstats 2 snpid chr bpos a1 a2 freq z pval n rs3094315 1 752566 A G 0.849630143319 -0.268320891765 0.787 150064 rs3131972 1 752721 A G 0.157940360610264 0.226337945774 0.8229 150064 rs3131969 1 754182 A G 0.143377253814147 -0.0127401275584 0.9903 150064 rs1048488 1 760912 T C 0.8275543227 -0.399863453576 0.6871 150064
At these stages, I don't have any idea, where it went wrong, I hope you can help to direct me. (at least, I think its not column name problem, like discussed above)
Thank you very much,
Restu
Hi @restuadi311 ,
The sumstats look fine to me, so as long as the specification of the column names (implied in your command) aligns with the format of your data, mtag
should run through. Can you attach the full log file or send the commands that you used?
Thanks, Hui
Hi Hui,
Thank you very much for your speedy reply, (If you think, you'll need some dummy files to check, please let me know, I'll upload it at google drive)
here below, is the log files :
(virtualenv_mtag) (base) [r.restuadi@delta008 pipe_auto2]$ python "$software_path"/mtag/mtag.py --sumstats "$inputs_dir"/"$ref_trait"_mtag_hm3_maf01.txt,"$inputs_dir"/"$i"_mtag_hm3_maf01.txt --out "$temp_dir"/"$i"_mtag_hm3_maf01 --n_min 0.0 --stream_stdout
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> <> MTAG: Multi-trait Analysis of GWAS <> Version: 1.0.8 <> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley <> Harvard University Department of Economics / Broad Institute of MIT and Harvard <> GNU General Public License v3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> Note: It is recommended to run your own QC on the input before using this program. <> Software-related correspondence: [email protected] <> All other correspondence: [email protected] <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Calling ./mtag.py
--p-name pval
--stream-stdout
--n-min 0.0
--n-value 80610,257828
--sumstats /shares/compbio/Group-Wray/restuadi/project/Multitraits_prediction/Jan2019run/pipe_auto2/als_mtag_hm3_maf01.txt,/shares/compbio/Group-Wray/restuadi/project/Multitraits_prediction/Jan2019run/pipe_auto2/CP_mtag_hm3_maf01.txt
--out /shares/compbio/Group-Wray/restuadi/project/Multitraits_prediction/Jan2019run/pipe_auto2/CP_mtag_hm3_maf01
Beginning MTAG analysis... MTAG will use the Z column for analyses. Read in Trait 1 summary statistics (933910 SNPs) from /shares/compbio/Group-Wray/restuadi/project/Multitraits_prediction/Jan2019run/pipe_auto2/als_mtag_hm3_maf01.txt ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value a2: a2, interpreted as non-ref allele for signed sumstat. z: Directional summary statistic as specified by --signed-sumstats.
Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. Read 933910 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 1 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 933909 SNPs remain. Adding uniform sample size 80610 to summary statistics. Removed 0 SNPs with duplicated rs numbers (933909 SNPs remain). Removed 0 SNPs with N < 0.0 (933909 SNPs remain). Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. Dropping snps with null values
Metadata: Mean chi^2 = 1.082 Lambda GC = 1.05 Max chi^2 = 130.165 42 Genome-wide significant SNPs (some may have been removed by filtering).
Conversion finished at Tue Apr 9 15:48:00 2019 Total time elapsed: 4.55s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 1 complete. SNPs remaining: 933909 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Read in Trait 2 summary statistics (921802 SNPs) from /shares/compbio/Group-Wray/restuadi/project/Multitraits_prediction/Jan2019run/pipe_auto2/CP_mtag_hm3_maf01.txt ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value a2: a2, interpreted as non-ref allele for signed sumstat. z: Directional summary statistic as specified by --signed-sumstats.
Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. Read 921802 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 921802 SNPs remain. Adding uniform sample size 257828 to summary statistics. Removed 0 SNPs with duplicated rs numbers (921802 SNPs remain). Removed 0 SNPs with N < 0.0 (921802 SNPs remain). Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. Dropping snps with null values
Metadata: Mean chi^2 = 2.175 Lambda GC = 1.827 Max chi^2 = 124.961 2868 Genome-wide significant SNPs (some may have been removed by filtering).
Conversion finished at Tue Apr 9 15:48:08 2019 Total time elapsed: 4.81s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 2 complete. SNPs remaining: 921802 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Dropped 4 SNPs due to strand ambiguity, 933905 SNPs remain in intersection after merging trait1
Dropped 0 SNPs due to strand ambiguity, 908163 SNPs remain in intersection after merging trait2
... Merge of GWAS summary statistics complete. Number of SNPs: 908163
cannot reindex from a duplicate axis
Traceback (most recent call last):
File "/shares/compbio/Group-Wray/restuadi/project/Multitraits_prediction/Jan2019run/software/mtag/mtag.py", line 1557, in
Thank you very much
Hi @restuadi311 ,
I tried but still could not replicate the error you're getting. Are you using the latest version of the software (re-pulled the repo just to make sure)? If this still doesn't work, feel free to upload your data to a shareable location and I'll take a look).
Thanks, Hui
Hi @huilisabrina
Yep, I've tried the newest MTAG version and still got the same trouble. I've stored the dummy to try here : https://drive.google.com/open?id=1G9555DjSweGHw8yIVneDYh0KZePKI24m
Please let me know, if there is a problem.
Thank you very much,
Restu
Hi @restuadi311 ,
Thanks for sharing your files! Sorry for the delay. This was due to a bug that was a bit hard to find. I just fixed it in the latest edits. Please re-pull the repo and try again. Let me know if this still doesn't work!
Best, Hui
Hi Hui,
Very sorry for the late reply, just come back to work from a nice and long sabbatical. Thank you very much for the fix, it's working well now.
Restu
Hi @huilisabrina ,
I have also encountered a similar problem. I don't know why it happened.
Error:
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> <> MTAG: Multi-trait Analysis of GWAS <> Version: 1.0.8 <> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley <> Harvard University Department of Economics / Broad Institute of MIT and Harvard <> GNU General Public License v3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> Note: It is recommended to run your own QC on the input before using this program. <> Software-related correspondence: [email protected] <> All other correspondence: [email protected] <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Calling ./mtag.py
--p-name pval
--stream-stdout
--n-min 0.0
--sumstats MAGIC1000G_FI_EUR_MTAG1.tsv,MAGIC1000G_FG_EUR_MTAG1.tsv
--out ./new
Beginning MTAG analysis... MTAG will use the Z column for analyses. Read in Trait 1 summary statistics (32635792 SNPs) from MAGIC1000G_FI_EUR_MTAG1. tsv ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value a2: a2, interpreted as non-ref allele for signed sumstat. z: Directional summary statistic as specified by --signed-sumstats. se: Standard errors of BETA coefficients
Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. Read 32635792 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 2398490 variants that were not SNPs. Note: strand ambiguous SNPs were no t dropped. 30237302 SNPs remain. Removed 0 SNPs with duplicated rs numbers (30237302 SNPs remain). Removed 0 SNPs with N < 0.0 (30237302 SNPs remain). Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. Dropping snps with null values
Metadata: Mean chi^2 = 1.032 Lambda GC = 0.998 Max chi^2 = 169.73 2179 Genome-wide significant SNPs (some may have been removed by filtering).
Conversion finished at Thu Sep 1 10:26:53 2022 Total time elapsed: 6.0m:43.97s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 1 complete. SNPs remaining: 30237332 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Trait 1: Dropped 30 SNPs for duplicate values in the "snp_name" column Read in Trait 2 summary statistics (34064006 SNPs) from MAGIC1000G_FG_EUR_MTAG1. tsv ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value a2: a2, interpreted as non-ref allele for signed sumstat. z: Directional summary statistic as specified by --signed-sumstats. se: Standard errors of BETA coefficients
Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. WARNING: 6 SNPs had P outside of (0,1]. The P column may be mislabeled. Read 34064006 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 6 SNPs with out-of-bounds p-values. Removed 2442984 variants that were not SNPs. Note: strand ambiguous SNPs were no t dropped. 31621016 SNPs remain. Removed 0 SNPs with duplicated rs numbers (31621016 SNPs remain). Removed 0 SNPs with N < 0.0 (31621016 SNPs remain). Median value of SIGNED_SUMSTAT was -0.00151976, which seems sensible. Dropping snps with null values
Metadata: Mean chi^2 = 1.044 Lambda GC = 0.998 Max chi^2 = 1477.304 6354 Genome-wide significant SNPs (some may have been removed by filtering).
Conversion finished at Thu Sep 1 10:37:45 2022 Total time elapsed: 7.0m:8.03s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 2 complete. SNPs remaining: 31621050 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Trait 2: Dropped 34 SNPs for duplicate values in the "snp_name" column
Dropped 4623481 SNPs due to strand ambiguity, 25613821 SNPs remain in intersection after merging trait1
Dropped 2 SNPs due to inconsistent allele pairs from phenotype 2. 25449696 SNPs remain.
Dropped 0 SNPs due to strand ambiguity, 25449696 SNPs remain in intersection after merging trait2
... Merge of GWAS summary statistics complete. Number of SNPs: 25449696
cannot reindex from a duplicate axis
Traceback (most recent call last):
File "/home/gyu/mtag/mtag.py", line 1577, in
The form of my GWAS summary data is like this:
trait1 snpid chr bpos a1 a2 freq beta se pval sample_size n z rs147324274 10 100000012 A G NA -0.0553 0.1863 0.8544 11047 196991 -0.296833 rs571272521 10 10000011 A G NA 0.14 0.1724 0.4606 7428 196991 0.812065 rs144804129 10 100000122 A T 0.003 -0.1099 0.1202 0.2684 28015 196991 -0.914309 rs6602381 10 10000018 A G 0.6 -0.0054 0.0022 0.03577 124123 196991 -2.45455 rs539340063 10 100000259 A G NA 0.6904 0.3295 0.03067 7428 196991 2.0953 rs147936544 10 100000274 A G 0.001 0.1089 0.1737 0.6802 8834.04 196991 0.626943 rs189891329 10 10000033 A G NA 0.872 0.6428 0.3275 444.998 196991 1.35657 rs188626770 10 100000430 T G NA 0.1228 0.2166 0.5612 9556.95 196991 0.566944 rs547178188 10 100000439 T C NA -0.0356 0.2591 0.9092 7428 196991 -0.137399
trait2 snpid chr bpos a1 a2 freq beta se pval sample_size n z rs147324274 10 100000012 A G NA -0.0257 0.1854 0.9613 26201.1 196991 -0.138619 rs571272521 10 10000011 A G NA 0.032 0.1405 0.8781 8729 196991 0.227758 rs144804129 10 100000122 A T 0.003 -0.0077 0.1082 0.9647 30955 196991 -0.0711645 rs6602381 10 10000018 A G 0.6 1e-04 0.0019 0.6219 165515 196991 0.0526316 rs539340063 10 100000259 A G NA -0.0393 0.3007 0.9276 8729 196991 -0.130695 rs147936544 10 100000274 A G 0.001 -0.179 0.1683 0.5381 16684 196991 -1.06358 rs188626770 10 100000430 T G NA 0.4309 0.2266 0.1228 9556.95 196991 1.90159 rs547178188 10 100000439 T C NA -0.0388 0.2368 0.9578 8729 196991 -0.163851 10_100000554_D_I 10 100000554 D I NA -0.0041 0.0083 0.6165 8729 196991 -0.493976
I have also updated the MTAG using git pull
Thank you very much!
Judging from stackoverflow, it looks like the most common cause of this is duplicate index values. Do you have duplicate rsIDs in your data?
MTAG does do some filtering and makes attempts at data cleaning, but it's not 100% comprehensive. If it's not duplicate rsIDs, it could also just be a need for some data QCing. It looks like you have a bunch of NA frequencies in the example data you included, and in the log file, it's dropping a lot of items that it's having a hard time with ("Removed 2442984 variants that were not SNPs", for example).
I'd try checking for duplicate rsIDs first, but if that doesn't work, then maybe some other data QC could help.
Many thanks!