pyseer Suggestions needed for the whole-genome model output

Hi,

This is not an issue but more of a question from my side.

I used the LMM model implemented in Pyseer to perform my GWAS analysis and figured out some significant variants that plays a role in resistance. Next, I read few review papers where they benchmarked different GWAS methods and they showed that for moderate to high LD (which is the case with the bacteria I am working), whole-genome models implemented in Pyseer performs better as compared to LMM's. So I wanted to try those models also on my data. Now, I have two questions regarding this:

I saw that whole-genome models are usually used for prediction of phenotype, but is it okay to use them for performing GWAS and getting the significant variants?
Anyways, I tried to use whole-genome models to perform GWAS on my data and used --distances parameter to calculate an adjusted p-value. After performing the GWAS, a part of my output sorted on lrt-pvalue looks like below:

Variant	af	filter-pvalue	lrt-pvalue	beta	notes
1_7570_C_T	0.161	3.67E-48	2.93E-47	4.24	matrix-inversion-error
1_7585_G_C	0.764	6.97E-31	4.94E-36	4.12	matrix-inversion-error
1_1473246_A_G	0.253	5.51E-33	3.47E-27	0.287	matrix-inversion-error
1_4269271_A_G	0.0792	3.01E-24	1.21E-23	0.871	bad-chisq
1_2155168_C_G	0.606	2.84E-20	1.32E-17	-0.463	matrix-inversion-error
1_761155_C_T	0.384	2.95E-07	2.14E-09	0.0511	matrix-inversion-error
1_2157481_C_T	0.0453	1.11E-06	2.61E-08	-0.64	bad-chisq
1_2157569_C_A	0.0453	1.11E-06	2.61E-08	-4.97E-15	bad-chisq

In this output, if I look at the 'notes' column, I can see 'matrix-inversion-error' and 'bad-chisq' for all printed variants. So I am not sure whether I should consider this as significant variant or not (Sorry if I this information is already present in documentation and I missed it)? To correct for multiple testing in the LMM model, I used the --output-patterns parameter and then used the count_patterns.py script to determine Bonferroni corrected p-value threshold. But when I use the whole-genome model, I cannot use the --output-patterns parameter here. So what strategy can I use here to correct for multiple testing?

Thanks a lot in advance

Aug 02 '23 14:08 AmayAgrawal

When you get the p-values from the whole genome model, it is actually just running GWAS with the fixed effects model and appending these to the table. Here it looks like that model might not be working too well, given those error messages. Maybe try the LMM to get your p-values?

Aug 14 '23 11:08 johnlees

Closed due to inactivity

Jul 17 '24 08:07 mgalardini

pyseer pyseer copied to clipboard

Suggestions needed for the whole-genome model output

pyseer
pyseer copied to clipboard