sgkit icon indicating copy to clipboard operation
sgkit copied to clipboard

Compute polygenic risk scores with sgkit

Open hammer opened this issue 3 years ago • 7 comments

We've got two nice use cases of population and statistical genetics; a third use case that could attract users and contributors could be polygenic risk score computations.

I'm only glancingly familiar with this literature but have certainly noticed its rapid growth in the past few years.

I did a brief search for Python libraries in this domain and turned up https://github.com/bvilhjal/ldpred and https://github.com/getian107/PRScs.

@alimanfoo @jeromekelleher any chance y'all have colleagues working in this space?

hammer avatar Sep 24 '20 14:09 hammer

@astheeggeggs suggested LDpred-funct as well (thanks for that Duncan!). He also noted that ldpred and PRScs include automatic estimation of heritability and percent causal variants. A quick read of LDpred shows that this is done through LD-score regression and via a mixture prior on the SNP effects, respectively.

LDpred-funct is interesting since it includes functional annotations (coding, regulatory, conserved, etc.) in the priors for the SNP effects. The paper also has a nice review of these methods and several others, since it is the most recent.

All of the above require an external LD panel, which is something we might want to put some thought into a representation for.

eric-czech avatar Mar 16 '21 13:03 eric-czech

All of the above require an external LD panel, which is something we might want to put some thought into a representation for.

For sure going to be necessary when we want to tackle #440.

hammer avatar Mar 16 '21 14:03 hammer

I wonder if we could wrap https://dask-glm.readthedocs.io/en/latest/api.html#dask_glm.regularizers.L1 in some cross-validation logic as an implementation for A Fast and Scalable Framework for Large-scale and Ultrahigh-dimensional Sparse Regression with Application to the UK Biobank (2020)? I've never used that method in dask-glm but if it works well, this would be a simple addition.

eric-czech avatar May 04 '21 17:05 eric-czech

I do appreciate the potential for scalability of this library, but I'm a bit concerned about this line from https://github.com/dask/dask-glm/blob/main/README.rst:

This library is not ready for use.

The contributors graph is also discouraging. It looks like there was a spike in development in 2017 and very little activity since then.

If we want a GLM in Dask it looks like the action has moved to dask-ml? Cf. https://ml.dask.org/glm.html.

hammer avatar May 04 '21 17:05 hammer

Ah, I see L1 penalties supported in the dask-ml logistic and linear regressions but it looks like the core solvers for those methods are still in dask-glm (https://github.com/dask/dask-ml/blame/main/dask_ml/linear_model/glm.py#L5)? Looks like the only substantial updates there in the last few years were for cupy support: https://github.com/dask/dask-glm/blame/main/dask_glm/algorithms.py. I choose to believe the CPU implementation was just perfected 3-4 years ago lol.

eric-czech avatar May 04 '21 17:05 eric-czech

Looks like https://github.com/bulik/ldsc from Ben Neale's lab is a good reference implementation too (LDpred-funct uses it, e.g.).

eric-czech avatar May 07 '21 10:05 eric-czech

Some thoughts from today's developer call:

  • We'll need to read in and represent sumstats, so #102 is a precursor for this work.
  • We'll need to read in and represent LD reference panels, so we should look for OpenGWAS-like projects that have done the hard work of preparing and hosting that data. PRScs has 1000G- and UKB-derived LD reference panels linked in their README, for example, and I think LDstore (click link at top) and LD Hub may be worth exploring in this area?
  • Once we have the inputs, we should implement a simple C + T method similar to PRSice.
  • The next simplest method to implement, I think, is tshmak/lassosum. The code is C++ and R, so we'll need to work from the paper.
  • After that, we can consider a Bayesian method like bvilhjal/ldpred or getian107/PRScs.

hammer avatar May 13 '21 16:05 hammer