pgsc_calc icon indicating copy to clipboard operation
pgsc_calc copied to clipboard

Improvements to PCA steps & default reference panel

Open smlmbrt opened this issue 1 year ago • 1 comments

Ideas for making PCA projections more robust

  • [ ] Subsetting to a smaller set of PCA eligible variants?
    • HapMap3 (same as bigsnpr)
    • Ancestry informative markers (similar to Hao et al.), such as those in doi:10.3389/fgene.2012.00322
  • [x] Avoid variants with high missingness in target dataset (use .vmiss files)
    • Implemented in beta release
  • [ ] Add back OCE samples to reference panel, but exclude from empirical calculation of Z-scores due to low number of individuals for comparison.

smlmbrt avatar Jul 31 '23 10:07 smlmbrt

Previously implemented:

  • Merged 1000 Genomes & Human Genome Diversity Project (HGDP):
    • ref: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9900804/
    • data: https://gnomad.broadinstitute.org/news/2020-10-gnomad-v3-1-new-content-methods-annotations-and-data-availability/#the-gnomad-hgdp-and-1000-genomes-callset

smlmbrt avatar Feb 26 '24 10:02 smlmbrt