pypiper
pypiper copied to clipboard
KeyError: 'Time' when using pipestat via pypiper
When I'm trying to switch from a normal pypiper pipeline to one that configures pipestat, I'm getting this error:
Traceback (most recent call last):
File "/home/nsheff/code/seqcolapi/analysis/pipeline/add_to_seqcol_server.py", line 92, in <module>
pm.stop_pipeline()
File "/home/nsheff/.local/lib/python3.11/site-packages/pypiper/manager.py", line 2106, in stop_pipeline
self.report_result("Time", elapsed_time_this_run, nolog=True)
File "/home/nsheff/.local/lib/python3.11/site-packages/pypiper/manager.py", line 1616, in report_result
reported_result = self.pipestat.report(
^^^^^^^^^^^^^^^^^^^^^
File "/home/nsheff/.local/lib/python3.11/site-packages/pipestat/pipestat.py", line 99, in inner
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nsheff/.local/lib/python3.11/site-packages/pipestat/pipestat.py", line 571, in report
schema=self.result_schemas[r],
~~~~~~~~~~~~~~~~~~~^^^
KeyError: 'Time'
I can't track this because I'm not doing anything related to Time. so it must be coming from pypiper or pipestat somehow.
One hint is this message:
These results exist for 'DEFAULT_SAMPLE_NAME': Time
These results exist for 'DEFAULT_SAMPLE_NAME': Success
It looks like there might be a bug somewhere with a constant that is getting stored as a string instead.
I think pipestat_sample_name is not being passed through to pipestat
actually I think it's pipestat_results_file that's not working correclty...
I figured it out.
Pypiper automatically adds results for Time and Success. If those aren't in your output schema, it fails. So you have to add this to the output schema:
Time:
type: "string"
description: "Elapsed time for the pipeline run as reported by pypiper"
Success:
type: "string"
description: "Timestamp for when the pipeline completed"
I think this is suboptimal, since I am not putting those in, they're automatic. Maybe pypiper should be the one adding them to the output schema, since it's the one reporting them automatically.
I made a more informative error message in pipestat to address this here: https://github.com/pepkit/pipestat/commit/0d511b5960d460b4dda701379f6a982e3f407a0c
This at least solves the immediate issue, but going forward:
- [ ] pypiper should add anything it uses into the schema on its own
- [ ] so, pipestat, probably needs to make it easier to merge/update/combine schemas. right now you can only give it a file path, and that's it -- there's no way to set the schema programmatically, or update it, or whatever. so, first, the pipestat schema loading system needs to be more flexible, in order to allow pypiper to update the schema and add its parameters.
Also confirmed this by adding the output_schema to the Pipelinemanager during the test_pipeline_manager.py test (I was initially surprised our tests didn't catch this):
self.pp = pypiper.PipelineManager(
"sample_pipeline", outfolder=self.OUTFOLDER, multi=True, pipestat_schema="/home/drc/GITHUB/pypiper/pypiper/tests/Data/sample_output_schema.yaml"
)
It will indeed fail with a KeyError:
tests/pipeline_manager/test_pipeline_manager.py::PipelineManagerTests::test_me - KeyError: 'Time'