nimpress icon indicating copy to clipboard operation
nimpress copied to clipboard

culling by linkage

Open brentp opened this issue 6 years ago • 5 comments

select only 1 site from each linkage region for the PRS summation.

brentp avatar Oct 05 '19 13:10 brentp

Any ideas on implementation?

I haven't thought about it much as linkage pruning was a lower priority feature for me, but locus substitution (ie finding good substitutes for PRS loci that are not well-genotyped, rather than just totally giving up and imputing) would be a killer feature, and I think could use the same backend.

mpinese avatar Oct 09 '19 11:10 mpinese

I was thinking that we could calculate R2 from the observed genotypes, given a large enough cohort.

Then choose, for example the single variant from a block with highest allele frequency.

brentp avatar Oct 09 '19 16:10 brentp

Ah yep that would work for LD pruning, though not for locus substitution. I've been thinking about adding support for something like a precomputed r2 file (https://www.cog-genomics.org/plink/1.9/ld#r) which is a good fit for the substitution problem, but would that address the linkage pruning issue for you?

I think I'm not really understanding the use case for linkage pruning, so let me know if I'm barking up the wrong tree here.

mpinese avatar Oct 10 '19 00:10 mpinese

I don't have a lot of experience in this area, but I guess I'm thinking of something like this: https://www.prsice.info/step_by_step/#clumping

brentp avatar Oct 10 '19 00:10 brentp

Ah I see, yes for a GPRS (ie all loci w/o LD pruning in the discovery phase, coefficients from simple per-SNP tests) that would be useful to ensure that densely sampled LD blocks don't dominate the score. Something to add for genome-wide for sure, thanks.

mpinese avatar Oct 10 '19 02:10 mpinese