hail icon indicating copy to clipboard operation
hail copied to clipboard

[query] Hail should provide `hl.saige` in both QoS and QoB

Open danking opened this issue 10 months ago • 3 comments

What happened?

SAIGE and its competitor REGENIE are the standard bearers for modern GWAS. Hail should expose SAIGE within the Hail Query language. The interface should roughly match hl.linear_regression_rows.

A Batch pipeline would serve the needs of Broadies (and, indeed, such a pipeline already exists) but has two downsides:

  1. There is substantial I/O involved in exporting the data from Hail-native formats to SAIGE-compatible formats.
  2. Non-Broadies cannot use this pipeline.

Query language support for SAIGE would transform the accessibility of SAIGE by making it usable at scale by anyone with access to Hail, which is basically anyone with a large dataset (e.g. DNANexus, AoU RWB, MVP, FinnGen).

There are two options:

  1. Determine and implement the linear algebraic primitives necessary for SAIGE.
  2. Compile and link directly against SAIGE. Expose these functions, via JNI, to the Hail Query language.

Version

0.2.120

Relevant log output

No response

danking avatar Aug 15 '23 22:08 danking