hail
hail copied to clipboard
[query] Hail should provide `hl.saige` in both QoS and QoB
What happened?
SAIGE and its competitor REGENIE are the standard bearers for modern GWAS. Hail should expose SAIGE within the Hail Query language. The interface should roughly match hl.linear_regression_rows
.
A Batch pipeline would serve the needs of Broadies (and, indeed, such a pipeline already exists) but has two downsides:
- There is substantial I/O involved in exporting the data from Hail-native formats to SAIGE-compatible formats.
- Non-Broadies cannot use this pipeline.
Query language support for SAIGE would transform the accessibility of SAIGE by making it usable at scale by anyone with access to Hail, which is basically anyone with a large dataset (e.g. DNANexus, AoU RWB, MVP, FinnGen).
There are two options:
- Determine and implement the linear algebraic primitives necessary for SAIGE.
- Compile and link directly against SAIGE. Expose these functions, via JNI, to the Hail Query language.
Version
0.2.120
Relevant log output
No response