mtag
mtag copied to clipboard
Missing SNPs from mtag output
Hello,
I am running mtag for three different traits using the EAS LD scores. Each trait is the meta-analysis output of several studies. I have previously QCed the mtag input files. That has as a results some variants that are present in one of the traits/meta-analyses, to be absent for the second input file. Is this the reason why this variant is missing from the mtag output? Initially I thought that these variants were missing because they are not available in the LD scores files but then I read in another thread that this is not the case. I added then the "--incld_ambig_snps", since more than 1 million variants were being excluded in this step, and indeed I "recovered" some of the missing variants in mtag output but not all of them. I was wondering if you have any other suggestions so I can overcome this.
Many thanks, Olga
Hi Olga,
I'm not sure I totally understand your question, but MTAG only produces results for SNPs that are in both sets of GWAS summary statistics. So if a SNP is only in one of the sets, then it will be dropped from the output. There are also a number of other filters that MTAG uses (e.g., a sample size filter) to remove SNPs that may violate MTAG's assumptions, so it's possible that your SNPs are being dropped due to those restrictions.
Best, Patrick
On Fri, Sep 11, 2020 at 8:50 AM gianolga [email protected] wrote:
Hello,
I am running mtag for three different traits using the EAS LD scores. Each trait is the meta-analysis output of several studies. I have previously QCed the mtag input files. That has as a results some variants that are present in one of the traits/meta-analyses, to be absent for the second input file. Is this the reason why this variant is missing from the mtag output? Initially I thought that these variants were missing because they are not available in the LD scores files but then I read in another thread that this is not the case. I added then the "--incld_ambig_snps", since more than 1 million variants were being excluded in this step, and indeed I "recovered" some of the missing variants in mtag output but not all of them. I was wondering if you have any other suggestions so I can overcome this.
Many thanks, Olga
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/111, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5ONPDCDE6B4NJMCI2DSFIMKLANCNFSM4RHNJD4Q .
Hello Patrick,
Many thanks for the swift response and apologies for my delayed reply, but I was on leave. You are right. I was initially using QCed datasets, so the variant of interest was not present in all the sets. I checked with the original, unfiltered data though (I have confirmed that this variant is present in all of them) but I am still missing it from the MTAG outputs. I was wondering if: a) can check in which MTAG step is this variant being filtered out (this will help at least justify why this variant is missing from the output) b) there are available any more options to skip the restrictions you mentioned, similar to --incld_ambig_snps
Many thanks again for the great support! Olga
Hi Olga,
I believe that if you look through the log file, it will till you how many SNPs are dropped in each step of the filter, but I don't think that there is a straightforward way to identify in which step a particular SNP gets dropped. Maybe drop all but that SNP from yoru summary file and pass that into MTAG? The software will likely fail before it produces results, but I think it will tell you the step in which that single SNP gets dropped. The easiest thing would likely be to just examine that SNP closely and identify which filter it doesn't satisfy.
Depending on the filter, some have options that you can adjust. I'd recommend looking at the list of options in the software and see if it has what you need. If it's not there, you would probably need to identify the line of code that applies the filter and make your own personal copy of the software that edits out that line.
Best, Patrick
On Wed, Sep 23, 2020 at 1:29 PM gianolga [email protected] wrote:
Hello Patrick,
Many thanks for the swift response and apologies for my delayed reply, but I was on leave. You are right. I was initially using QCed datasets, so the variant of interest was not present in all the sets. I checked with the original, unfiltered data though (I have confirmed that this variant is present in all of them) but I am still missing it from the MTAG outputs. I was wondering if: a) can check in which MTAG step is this variant being filtered out (this will help at least justify why this variant is missing from the output) b) there are available any more options to skip the restrictions you mentioned, similar to --incld_ambig_snps
Many thanks again for the great support! Olga
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/111#issuecomment-697749044, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5JQ256J62445GP33ALSHIWBLANCNFSM4RHNJD4Q .
Hi Patrick,
I followed your suggestion to run the MTAG using only this SNP from the three sets. It was indeed very helpful to identify the reason!
The error that I am getting has to do with the reference panel: "ValueError: After merging with reference panel LD, 0 SNPs remain."
I am working with East Asian traits, so I have downloaded the LD scores from 1000Genomes as suggested in one of the issues threads. I have checked and both traits and LD scores are using rsIDs. The LD score lists must be filtered based on the number SNPs and the "SNP of interest" is not included. However, in one of the issues thread (https://github.com/JonJala/mtag/issues/56) it is mentioned that "SNPs that are not in the LD score panel should still be used by MTAG", so I am not sure exactly why I am getting this error.
Many thanks, Olga
Hi Olga,
Sorry for the slow response here. It's theoretically possible to calculate MTAG estimates even if you don't have the LD score for a SNP, but I don't think that we allowed for this in the software. If you really would like to get MTAG estimates for that particular SNP, it may be possible to use the option to directly give Omega and Sigma estimates to MTAG so it skips the LD score step, and that might keep the SNP you are interested in. You'd need to create files that can be read into the software by hand. (I'm not 100% sure this will work though, but it might be worth trying. The alternative would be to edit the MTAG code yourself to do exactly what you'd like it to do, and I don't know how comfortable you are with that.)
Best, Patrick
On Wed, Sep 23, 2020 at 3:47 PM gianolga [email protected] wrote:
Hi Patrick,
I followed your suggestion to run the MTAG using only this SNP from the three sets. It was indeed very helpful to identify the reason!
The error that I am getting has to do with the reference panel: "ValueError: After merging with reference panel LD, 0 SNPs remain."
I am working with East Asian traits, so I have downloaded the LD scores from 1000Genomes as suggested in one of the issues threads. I have checked and both traits and LD scores are using rsIDs. The LD score lists must be filtered based on the number SNPs and the "SNP of interest" is not included. However, in one of the issues thread (#56 https://github.com/JonJala/mtag/issues/56) it is mentioned that "SNPs that are not in the LD score panel should still be used by MTAG", so I am not sure exactly why I am getting this error.
Many thanks, Olga
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/111#issuecomment-697934687, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5LQK65W3WI2VAYVDBDSHJGFFANCNFSM4RHNJD4Q .
Hello Patrick,
No worries, you have been so helpful. I am more interested in understanding how mtag works, to get the full potential of it, rather than "rescuing" the specific SNP. For this analysis, I am using the 1000G EAS scores computed for ldsc, as suggested in another thread. A total of 1.2 million variants are included in these LD score files. So if mtag can provide results only for variants with LD score available, I'm afraid this minimises the output a lot. I have gone through the code but I'm not sure which steps exactly need to be modified, but I will attempt the other suggestion (i.e. getting the Omega & Sigma from ldsc and using these instead).
Many thanks, Olga
Hi Olga,
1.1 million does sound small. If you want to make sure that all of the SNPs in your summary statistics are included, you might consider using the LDSC software to generate your own LD scores for the set of SNPs you need. The 1000 Genome Data is publicly available, so I don't think you should have any data access issues in getting it.
Best, Patrick
On Tue, Sep 29, 2020 at 4:58 AM gianolga [email protected] wrote:
Hello Patrick,
No worries, you have been so helpful. I am more interested in understanding how mtag works, to get the full potential of it, rather than "rescuing" the specific SNP. For this analysis, I am using the 1000G EAS scores computed for ldsc, as suggested in another thread. A total of 1.2 million variants are included in these LD score files. So if mtag can provide results only for variants with LD score available, I'm afraid this minimises the output a lot. I have gone through the code but I'm not sure which steps exactly need to be modified, but I will attempt the other suggestion (i.e. getting the Omega & Sigma from ldsc and using these instead).
Many thanks, Olga
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/111#issuecomment-700566708, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5JLNTGSZFITEBGFZ5LSIGOTBANCNFSM4RHNJD4Q .