hail icon indicating copy to clipboard operation
hail copied to clipboard

[hailtop.saige] Implement SAIGE in QoB

Open jigold opened this issue 8 months ago • 8 comments

@danking I have a mostly completed draft for SAIGE in QoB. Can you take a look? I'm mainly looking for enough feedback to get a green light to actually start testing this end to end, fill in the remaining not implemented components, add documentation, add verbosity and possibly a dry run feature, and support VEP annotations natively.

There are a couple of core concepts:

  1. Phenotypes - Set of phenotypes to test. I support the ability to group phenotypes together. This is in anticipation of a new version of SAIGE that Wei is going to release soon.
  2. VariantChunks - The set of variant intervals of data to test per job. If it's SAIGE-GENE, then there's also the "groups" to actually test within that interval.
  3. io - There's a bunch of wrappers that handle input and output files so all of that logic combined with the checkpointing logic is abstracted away from what is actually going on.
  4. steps - These are the SAIGE modules to run. They are all dataclasses with configuration options
  5. saige - There's a class that can be instantiated in Python or I started writing the framework for a CLI. This has the code that builds the DAG end to end.

All configuration happens with a yaml file that can overwrite default parameters for each step such as whether to checkpoint or where the results should be written to. For the CLI, I envision you can either give a config file and/or specify --overrides step1_null_glmm.use_checkpoint=true. For every Saige run, I write out the configuration used to a file in the output directory as well as information about the input data and variant chunks and the batch information.

jigold avatar Oct 12 '23 16:10 jigold