mtag
mtag copied to clipboard
Effect directions
Hello - I'm just flagging an issue with the coding of the effect / non-effect alleles. In the Wiki, it states that A1 is considered the effect allele. However, whenever I read in data, MTAG uses A2 as the effect (non-ref) allele and A1 as reference, according to the log files. I have not specified A1 / A2 in my command file and all the input stats have matched alleles (ie there is no need for allele flipping).
So with the output files for our analysis, The correct way to read our MTAG results files is:
A1 = REFERENCE allele A2= EFFECT (ie NON-REFERENCE) allele
Wiki basic tutorial: "a1/a2: Alleles observed at the particular locus. a1 is considered to be the effect allele, which should also be reflected in the signs of the Z-scores. These columns are also passed to the ldsc routine. mtag checks and flips the a1 and a2 alleles so that they are identical across input files. Other column names may be passed via the --a1_name and --a2_name options."
Log file from analysis:
Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2020/10/12/09:36:52 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2020/10/12/09:36:52 AM Interpreting column names as follows: 2020/10/12/09:36:52 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients
If the Wiki is wrong, it might be worth correcting as this has resulted in a lot of confusion with colleagues using these files for downstream analyses. Will be interested to hear what you think.
But otherwise, this is a great program so thanks for making it available - have used it to great effect with the UKBB data!
Hi, would you mind including what your MTAG command line looked like? (it will make it easier to dig into the flow of what's going on if we've got the values for the flags)
On Mon, Mar 8, 2021 at 10:36 AM fletchkatie [email protected] wrote:
Hello - I'm just flagging an issue with the coding of the effect / non-effect alleles. In the Wiki, it states that A1 is considered the effect allele. However, whenever I read in data, MTAG uses A2 as the effect (non-ref) allele and A1 as reference, according to the log files. I have not specified A1 / A2 in my command file and all the input stats have matched alleles (ie there is no need for allele flipping).
So with the output files for our analysis, The correct way to read our MTAG results files is:
A1 = REFERENCE allele A2= EFFECT (ie NON-REFERENCE) allele
Wiki basic tutorial: "a1/a2: Alleles observed at the particular locus. a1 is considered to be the effect allele, which should also be reflected in the signs of the Z-scores. These columns are also passed to the ldsc routine. mtag checks and flips the a1 and a2 alleles so that they are identical across input files. Other column names may be passed via the --a1_name and --a2_name options."
Log file from analysis: Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2020/10/12/09:36:52 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2020/10/12/09:36:52 AM Interpreting column names as follows: 2020/10/12/09:36:52 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients
If the Wiki is wrong, it might be worth correcting as this has resulted in a lot of confusion with colleagues using these files for downstream analyses. Will be interested to hear what you think.
But otherwise, this is a great program so thanks for making it available - have used it to great effect with the UKBB data!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/126, or unsubscribe https://github.com/notifications/unsubscribe-auth/APIOF54CJSOURPUCYKJKUZTTCTVJNANCNFSM4YZTR2AQ .
Hello,
Looking at your question more closely, I think I actually understand what the problem is. We use the plink convention of referring to the reference allele as the effect allele (with both being a1). I think that in VCF files, they have reversed the notation so the alternate allele is the effect allele. But I agree that the lack of clarity here is confusing. We will update the Wiki and documentation to make it clear that a1 is supposed to refer to the effect allele.
Does that answer your question?
Thanks! Patrick
On Mon, Mar 8, 2021 at 12:40 PM Jonathan Jala [email protected] wrote:
Hi, would you mind including what your MTAG command line looked like? (it will make it easier to dig into the flow of what's going on if we've got the values for the flags)
On Mon, Mar 8, 2021 at 10:36 AM fletchkatie [email protected] wrote:
Hello - I'm just flagging an issue with the coding of the effect / non-effect alleles. In the Wiki, it states that A1 is considered the effect allele. However, whenever I read in data, MTAG uses A2 as the effect (non-ref) allele and A1 as reference, according to the log files. I have not specified A1 / A2 in my command file and all the input stats have matched alleles (ie there is no need for allele flipping).
So with the output files for our analysis, The correct way to read our MTAG results files is:
A1 = REFERENCE allele A2= EFFECT (ie NON-REFERENCE) allele
Wiki basic tutorial: "a1/a2: Alleles observed at the particular locus. a1 is considered to be the effect allele, which should also be reflected in the signs of the Z-scores. These columns are also passed to the ldsc routine. mtag checks and flips the a1 and a2 alleles so that they are identical across input files. Other column names may be passed via the --a1_name and --a2_name options."
Log file from analysis: Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2020/10/12/09:36:52 AM
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2020/10/12/09:36:52 AM Interpreting column names as follows: 2020/10/12/09:36:52 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients
If the Wiki is wrong, it might be worth correcting as this has resulted in a lot of confusion with colleagues using these files for downstream analyses. Will be interested to hear what you think.
But otherwise, this is a great program so thanks for making it available
have used it to great effect with the UKBB data!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/126, or unsubscribe < https://github.com/notifications/unsubscribe-auth/APIOF54CJSOURPUCYKJKUZTTCTVJNANCNFSM4YZTR2AQ
.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/126#issuecomment-792939813, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5IGPS6UV4VJZYLLNVLTCUDXNANCNFSM4YZTR2AQ .
Hello - thanks for looking into this and for your helpful and detailed responses.
I think I have solved the problem - it was to do with how I created the Input files from BOLT.
It would help if the non-ref / ref allele bit of the logs was a bit clearer re effect allele / non-effect allele though as this threw me!
Many thanks.
Best wishes, Katie
My command line is pasted below. I have looked at your other reply re PLINK naming conventions.
MY COMMAND LINE:
echo "#PBS -lselect=1:ncpus=48:mem=124gb #PBS -lwalltime=24:00:00
module load anaconda3/personal
source activate /rds/general/project/lms-ukbiobank-analysis/live/Katie/mtagenv1
python /rds/general/project/francis_ukbb/live/MTAG/software/mtag/mtag.py
--sumstats $input_dir/AAmax,$input_dir/AAmin,$input_dir/AAdis,$input_dir/DAmax,$input_dir/DAmin,$input_dir/DAdis
--out /rds/general/project/lms-ukbiobank-analysis/live/Katie/MTAG_X_results
--incld_ambig_snps
--use_beta_se
--n_min 0.0
--stream_stdout" > /rds/general/project/lms-ukbiobank-analysis/live/Katie/MTAG_X_input_files/mtag_withX.cmd
qsub -o /rds/general/project/lms-ukbiobank-analysis/live/Katie/MTAG_X_input_files/mtag_withX.log -e /rds/general/project/lms-ukbiobank-analysis/live/Katie/MTAG_X_input_files/mtag_withX.err < /rds/general/project/lms-ukbiobank-analysis/live/Katie/MTAG_X_input_files/mtag_withX.cmd
From: Jonathan Jala [email protected] Sent: 08 March 2021 17:40 To: JonJala/mtag [email protected] Cc: fletchkatie [email protected]; Author [email protected] Subject: Re: [JonJala/mtag] Effect directions (#126)
Hi, would you mind including what your MTAG command line looked like? (it will make it easier to dig into the flow of what's going on if we've got the values for the flags)
On Mon, Mar 8, 2021 at 10:36 AM fletchkatie [email protected] wrote:
Hello - I'm just flagging an issue with the coding of the effect / non-effect alleles. In the Wiki, it states that A1 is considered the effect allele. However, whenever I read in data, MTAG uses A2 as the effect (non-ref) allele and A1 as reference, according to the log files. I have not specified A1 / A2 in my command file and all the input stats have matched alleles (ie there is no need for allele flipping).
So with the output files for our analysis, The correct way to read our MTAG results files is:
A1 = REFERENCE allele A2= EFFECT (ie NON-REFERENCE) allele
Wiki basic tutorial: "a1/a2: Alleles observed at the particular locus. a1 is considered to be the effect allele, which should also be reflected in the signs of the Z-scores. These columns are also passed to the ldsc routine. mtag checks and flips the a1 and a2 alleles so that they are identical across input files. Other column names may be passed via the --a1_name and --a2_name options."
Log file from analysis: Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2020/10/12/09:36:52 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2020/10/12/09:36:52 AM Interpreting column names as follows: 2020/10/12/09:36:52 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients
If the Wiki is wrong, it might be worth correcting as this has resulted in a lot of confusion with colleagues using these files for downstream analyses. Will be interested to hear what you think.
But otherwise, this is a great program so thanks for making it available - have used it to great effect with the UKBB data!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/126, or unsubscribe https://github.com/notifications/unsubscribe-auth/APIOF54CJSOURPUCYKJKUZTTCTVJNANCNFSM4YZTR2AQ .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FJonJala%2Fmtag%2Fissues%2F126%23issuecomment-792939813&data=04%7C01%7C%7Cb0ab3a14b55641e5510808d8e259372b%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637508220088386722%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zgp29c7qkIoMAPXyUF9O5vrXEH%2Fw5GrpBywPSLwY6s0%3D&reserved=0, or unsubscribehttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADNB2SMIJUXKRXSXK3AEZ7TTCUDXNANCNFSM4YZTR2AQ&data=04%7C01%7C%7Cb0ab3a14b55641e5510808d8e259372b%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637508220088386722%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=cBfuAWKacJnSLZq31IXZhjAd0y5ebY%2FQhIEmFo5voJI%3D&reserved=0.