pypiper
pypiper copied to clipboard
unifying pipestat config with pipestat constructor
Right now, the docs suggest configuring pipestat via pypiper like this:
pm = pypiper.PipelineManager(
...,
pipestat_schema="custom_results_schema.yaml",
pipestat_results_file="custom_results_file.yaml",
pipestat_sample_name="my_record",
pipestat_project_name="my_namespace",
pipestat_config="custom_pipestat_config.yaml",
)
meanwhile, pipestat is configured like this:
psm = pipestat.PipestatManager(
record_identifier="sample1",
results_file_path=temp_file,
schema_path="../tests/data/sample_output_schema.yaml",
)
I would like these to be uniform. So, I want to do:
pipestat_config = {
"record_identifier":sample["sample_name"],
"schema_path":"pipeline/output_schema.yaml",
"results_file_path":"results.yaml",
"pipeline_type":"sample"
}
And use this for either, like:
psm = pipesatat.PipestatManager(**pipestat_config)
or:
pm = pypiper.PipelineManager(
..., #pypiper options
pipestat_config=pipestat_config)
This way, there's a single argument to PipelineManager, which accepts a dict of pipestat config options, which can be passed with **kwargs. This seems cleaner than specifying separate arguments, one for each pipestat config option. Also, it will ensure the options stay in sync -- right now they're out of sync (pypiper wants pipestat_sample_name, which it will pass to record_identifier). So, it will eliminate maintaining a bunch of pypiper argument names for consistency.
Another issue is that I can't figure out how to map the config options to configure pipestat the way I want it. I don't know what pipestat_project_name maps to, and I don't see how to set the pipeline_type through pypiper.
Just another example where this bit me again.
I wanted to pass multi_pipeilnes=True to pipestat, when I'm constructing my pypiper.PipelineManager, but this is not documented. The way to do it is to say multi=True to pypiper, which takes this and changes it to multi_pipelines=True passed to pipestat. I had to find this in the code itself to figure it out.
This would be easier and not require additional documentation if instead we used pipestat_config and passed through kwargs.