pyseer
pyseer copied to clipboard
Suggestions needed for the whole-genome model output
Hi,
This is not an issue but more of a question from my side.
I used the LMM model implemented in Pyseer to perform my GWAS analysis and figured out some significant variants that plays a role in resistance. Next, I read few review papers where they benchmarked different GWAS methods and they showed that for moderate to high LD (which is the case with the bacteria I am working), whole-genome models implemented in Pyseer performs better as compared to LMM's. So I wanted to try those models also on my data. Now, I have two questions regarding this:
-
I saw that whole-genome models are usually used for prediction of phenotype, but is it okay to use them for performing GWAS and getting the significant variants?
-
Anyways, I tried to use whole-genome models to perform GWAS on my data and used
--distances
parameter to calculate an adjusted p-value. After performing the GWAS, a part of my output sorted on lrt-pvalue looks like below:
Variant | af | filter-pvalue | lrt-pvalue | beta | notes |
---|---|---|---|---|---|
1_7570_C_T | 0.161 | 3.67E-48 | 2.93E-47 | 4.24 | matrix-inversion-error |
1_7585_G_C | 0.764 | 6.97E-31 | 4.94E-36 | 4.12 | matrix-inversion-error |
1_1473246_A_G | 0.253 | 5.51E-33 | 3.47E-27 | 0.287 | matrix-inversion-error |
1_4269271_A_G | 0.0792 | 3.01E-24 | 1.21E-23 | 0.871 | bad-chisq |
1_2155168_C_G | 0.606 | 2.84E-20 | 1.32E-17 | -0.463 | matrix-inversion-error |
1_761155_C_T | 0.384 | 2.95E-07 | 2.14E-09 | 0.0511 | matrix-inversion-error |
1_2157481_C_T | 0.0453 | 1.11E-06 | 2.61E-08 | -0.64 | bad-chisq |
1_2157569_C_A | 0.0453 | 1.11E-06 | 2.61E-08 | -4.97E-15 | bad-chisq |
In this output, if I look at the 'notes' column, I can see 'matrix-inversion-error' and 'bad-chisq' for all printed variants. So I am not sure whether I should consider this as significant variant or not (Sorry if I this information is already present in documentation and I missed it)? To correct for multiple testing in the LMM model, I used the --output-patterns
parameter and then used the count_patterns.py
script to determine Bonferroni corrected p-value threshold. But when I use the whole-genome model, I cannot use the --output-patterns
parameter here. So what strategy can I use here to correct for multiple testing?
Thanks a lot in advance
When you get the p-values from the whole genome model, it is actually just running GWAS with the fixed effects model and appending these to the table. Here it looks like that model might not be working too well, given those error messages. Maybe try the LMM to get your p-values?
Closed due to inactivity