hail
hail copied to clipboard
[hailtop.saige] Implement SAIGE in QoB
@danking I have a mostly completed draft for SAIGE in QoB. Can you take a look? I'm mainly looking for enough feedback to get a green light to actually start testing this end to end, fill in the remaining not implemented components, add documentation, add verbosity and possibly a dry run feature, and support VEP annotations natively.
There are a couple of core concepts:
- Phenotypes - Set of phenotypes to test. I support the ability to group phenotypes together. This is in anticipation of a new version of SAIGE that Wei is going to release soon.
- VariantChunks - The set of variant intervals of data to test per job. If it's SAIGE-GENE, then there's also the "groups" to actually test within that interval.
- io - There's a bunch of wrappers that handle input and output files so all of that logic combined with the checkpointing logic is abstracted away from what is actually going on.
- steps - These are the SAIGE modules to run. They are all dataclasses with configuration options
- saige - There's a class that can be instantiated in Python or I started writing the framework for a CLI. This has the code that builds the DAG end to end.
All configuration happens with a yaml file that can overwrite default parameters for each step such as whether to checkpoint or where the results should be written to. For the CLI, I envision you can either give a config file and/or specify --overrides step1_null_glmm.use_checkpoint=true
. For every Saige run, I write out the configuration used to a file in the output directory as well as information about the input data and variant chunks and the batch information.