PRSice
PRSice copied to clipboard
Is this overfitting? PRScise-2
DILIN.Sakaue2021.GCST90018798.noMHC.log
Hi I'm using PRSice-2 to run 100s & 1000s of traits from various studies from GWAS Catalog against my target data (in plink ~12,000 samples with ~1,800 cases.) For many traits like the one attached, the Best fit PRS is achieved at P-threshold
Phenotype Set Threshold PRS.R2 Full.R2 Null.R2 Prevalence Coefficient Standard.Error P Num_SNP Empirical-P
- Base 0.0871001 0.0861248 0.134254 0.0526646 - -486.689 20.645 7.08941e-123 28988 0.000999001
Is this over fitting? Pvalue for this trait is 7e-123, isn't that way too much (low), even though the empirical -P is 0.0009. Also ~30,000 snps were used. Out of the 220 traits from this study (Sakaue et al 2021) 74 traits have empirical-P ~ 0.0009
Attached is my log file and plots.
I'm wondering what I'm not understanding about the methodogy and interpretation of PRS. What would you suggest I do?
Here are top 10 Pthresholds from the .prsice output (2000 thresholds tested) Pheno Set Threshold R2 P Coefficient Standard.Error Num_SNP
- Base 5.005e-05 0.000143812 0.312603 -0.340045 0.336753 29
- Base 0.00010005 0.000648264 0.0323814 -0.973314 0.45489 51
- Base 0.00015005 0.000787986 0.0182923 -1.3745 0.582504 73
- Base 0.00020005 0.000426456 0.0820578 -1.1973 0.688553 99
- Base 0.00025005 0.000410892 0.0877659 -1.28292 0.751431 123
- Base 0.00030005 0.000565421 0.0452448 -1.69571 0.846849 146
- Base 0.00035005 0.000789217 0.0180115 -2.17015 0.917464 168 ... ...
- Base 0.0999001 0.0800213 8.1373e-121 -517.879 22.1576 32168
- Base 0.0999501 0.0799776 9.57176e-121 -517.909 22.1655 32182
- Base 0.1 0.0801112 6.39797e-121 -518.784 22.1866 32209
- Base 1 0.00222882 6.82245e-05 -342.871 86.0967 165906
Thank You in advance for taking the time for looking at this.
Difficult to tell without the full context, but does not seems like there is anything out of the ordinary.