looper Refactor Looper's integration of Pipestat

Refactor Looper's integration of Pipestat

Open donaldcampbelljr opened this issue 9 months ago • 8 comments

Currently, Looper checks if Pipestat has been configured for each sample before adding the sample to the submission conductor.

If pipestat can be successfully configured, looper generates a configuration file to be used by pipestat called looper_pipestat_config.yaml which looks something like this:

results_file_path: /home/drc/GITHUB/hello_looper/hello_looper/pipestat/./results.yaml
flag_file_dir: /home/drc/GITHUB/hello_looper/hello_looper/pipestat/./results/flags
record_identifier: frog_2
output_dir: /home/drc/GITHUB/hello_looper/hello_looper/pipestat/./results
schema_path: /home/drc/GITHUB/hello_looper/hello_looper/pipestat/./pipeline_pipestat/pipestat_output_schema.yaml
pipeline_name: test_pipe
pipeline_type: sample

Currently, user adds pipestat field to .looper.yaml file with relevant info:

pep_config: ./project/project_config.yaml # pephub registry path or local path  
output_dir: ./results  
pipeline_interfaces:  
  sample:  ./pipeline_pipestat/pipeline_interface.yaml  
  project: ./pipeline_pipestat/pipeline_interface_project.yaml  
pipestat:  
  results_file_path: results.yaml  
  flag_file_dir: results/flags

after setting everything up, looper creates a pipestat config file which can be used by the pipeline author to configure pipestat by passing that along to a pipestat instance within a pipeline:

results_file_path: /home/drc/GITHUB/hello_looper/hello_looper/pipestat/./results.yaml  
flag_file_dir: /home/drc/GITHUB/hello_looper/hello_looper/pipestat/./results/flags  
output_dir: /home/drc/GITHUB/hello_looper/hello_looper/pipestat/./results  
schema_path: /home/drc/GITHUB/hello_looper/hello_looper/pipestat/./pipeline_pipestat/pipestat_output_schema.yaml  
pipeline_name: example_pipestat_pipeline  
pipeline_type: sample  
record_identifier: frog_2

For example: the pipeline interface author (pipeline author) can pass these values to the pipeline:

pipeline_interface (for a pipeline.py):

pipeline_name: example_pipestat_pipeline  
pipeline_type: sample  
output_schema: pipestat_output_schema.yaml  
command_template: >  
  python {looper.piface_dir}/count_lines.py {sample.file} {sample.sample_name} {pipestat.results_file}

pipeline_interface (for a shell pipeline):

pipeline_name: example_pipestat_pipeline  
pipeline_type: sample  
output_schema: pipestat_output_schema.yaml  
command_template: >  
  {looper.piface_dir}/count_lines_pipestat.sh {sample.file} {sample.sample_name} {pipestat.config_file}

How looper checks for pipestat configuration: https://github.com/pepkit/looper/blob/389967231963ee00020baf93b5cc66288fc32745/looper/project.py#L336-L352

The main functions for this are _check_if_pipestat_configured and _get_pipestat_configuration.

Code moves through _check_if_pipestat_configured first and will return True or False. If there is any exception raised during the next step for either a single sample or a project, it will return False.

https://github.com/pepkit/looper/blob/389967231963ee00020baf93b5cc66288fc32745/looper/project.py#L471-L503

If this function returns False, looper continues, assuming the user does not wish to use pipestat.

Related Issues: https://github.com/pepkit/looper/issues/411 https://github.com/pepkit/looper/issues/413 https://github.com/pepkit/looper/issues/425 https://github.com/pepkit/looper/issues/459 https://github.com/pepkit/looper/issues/471

May 20 '24 19:05 donaldcampbelljr

looper looper copied to clipboard

Refactor Looper's integration of Pipestat

looper
looper copied to clipboard