nimpress
nimpress copied to clipboard
culling by linkage
select only 1 site from each linkage region for the PRS summation.
Any ideas on implementation?
I haven't thought about it much as linkage pruning was a lower priority feature for me, but locus substitution (ie finding good substitutes for PRS loci that are not well-genotyped, rather than just totally giving up and imputing) would be a killer feature, and I think could use the same backend.
I was thinking that we could calculate R2 from the observed genotypes, given a large enough cohort.
Then choose, for example the single variant from a block with highest allele frequency.
Ah yep that would work for LD pruning, though not for locus substitution. I've been thinking about adding support for something like a precomputed r2 file (https://www.cog-genomics.org/plink/1.9/ld#r) which is a good fit for the substitution problem, but would that address the linkage pruning issue for you?
I think I'm not really understanding the use case for linkage pruning, so let me know if I'm barking up the wrong tree here.
I don't have a lot of experience in this area, but I guess I'm thinking of something like this: https://www.prsice.info/step_by_step/#clumping
Ah I see, yes for a GPRS (ie all loci w/o LD pruning in the discovery phase, coefficients from simple per-SNP tests) that would be useful to ensure that densely sampled LD blocks don't dominate the score. Something to add for genome-wide for sure, thanks.