looper
looper copied to clipboard
Refactor Looper's integration of Pipestat
Currently, Looper checks if Pipestat has been configured for each sample before adding the sample to the submission conductor.
If pipestat can be successfully configured, looper generates a configuration file to be used by pipestat called looper_pipestat_config.yaml
which looks something like this:
results_file_path: /home/drc/GITHUB/hello_looper/hello_looper/pipestat/./results.yaml
flag_file_dir: /home/drc/GITHUB/hello_looper/hello_looper/pipestat/./results/flags
record_identifier: frog_2
output_dir: /home/drc/GITHUB/hello_looper/hello_looper/pipestat/./results
schema_path: /home/drc/GITHUB/hello_looper/hello_looper/pipestat/./pipeline_pipestat/pipestat_output_schema.yaml
pipeline_name: test_pipe
pipeline_type: sample
Currently, user adds pipestat field to .looper.yaml
file with relevant info:
pep_config: ./project/project_config.yaml # pephub registry path or local path
output_dir: ./results
pipeline_interfaces:
sample: ./pipeline_pipestat/pipeline_interface.yaml
project: ./pipeline_pipestat/pipeline_interface_project.yaml
pipestat:
results_file_path: results.yaml
flag_file_dir: results/flags
after setting everything up, looper creates a pipestat config file which can be used by the pipeline author to configure pipestat by passing that along to a pipestat instance within a pipeline:
results_file_path: /home/drc/GITHUB/hello_looper/hello_looper/pipestat/./results.yaml
flag_file_dir: /home/drc/GITHUB/hello_looper/hello_looper/pipestat/./results/flags
output_dir: /home/drc/GITHUB/hello_looper/hello_looper/pipestat/./results
schema_path: /home/drc/GITHUB/hello_looper/hello_looper/pipestat/./pipeline_pipestat/pipestat_output_schema.yaml
pipeline_name: example_pipestat_pipeline
pipeline_type: sample
record_identifier: frog_2
For example: the pipeline interface author (pipeline author) can pass these values to the pipeline:
pipeline_interface (for a pipeline.py):
pipeline_name: example_pipestat_pipeline
pipeline_type: sample
output_schema: pipestat_output_schema.yaml
command_template: >
python {looper.piface_dir}/count_lines.py {sample.file} {sample.sample_name} {pipestat.results_file}
pipeline_interface (for a shell pipeline):
pipeline_name: example_pipestat_pipeline
pipeline_type: sample
output_schema: pipestat_output_schema.yaml
command_template: >
{looper.piface_dir}/count_lines_pipestat.sh {sample.file} {sample.sample_name} {pipestat.config_file}
How looper checks for pipestat configuration: https://github.com/pepkit/looper/blob/389967231963ee00020baf93b5cc66288fc32745/looper/project.py#L336-L352
The main functions for this are _check_if_pipestat_configured
and _get_pipestat_configuration
.
Code moves through _check_if_pipestat_configured
first and will return True or False. If there is any exception raised during the next step for either a single sample or a project, it will return False.
https://github.com/pepkit/looper/blob/389967231963ee00020baf93b5cc66288fc32745/looper/project.py#L471-L503
If this function returns False, looper continues, assuming the user does not wish to use pipestat.
Related Issues: https://github.com/pepkit/looper/issues/411 https://github.com/pepkit/looper/issues/413 https://github.com/pepkit/looper/issues/425 https://github.com/pepkit/looper/issues/459 https://github.com/pepkit/looper/issues/471