Inject metadata about workflow run into workflow step?
This seems similar/related to #1575
Is it possible for Cromwell to expose metadata about its current workflow to the steps that are running? Either via injecting ENV variables into containers, or potentially as inputs (as discussed in that issue?).
Our use case is wanting to add a final step to our workflows that POSTs information about the run, its output paths, etc to an external tracking service.
While exploring the idea of using a monitoring_image for this, I noticed it injects more or less the metadata I'd want into the monitoring container via environment variables already:
https://github.com/broadinstitute/cromwell/blob/adb8d2ad87cba307e5b1eccd1a3e21857cc9b81c/supportedBackends/google/pipelines/v2beta/src/main/scala/cromwell/backend/google/pipelines/v2beta/api/MonitoringAction.scala#L36
https://github.com/broadinstitute/cromwell/blob/adb8d2ad87cba307e5b1eccd1a3e21857cc9b81c/supportedBackends/google/pipelines/common/src/main/scala/cromwell/backend/google/pipelines/common/monitoring/Env.scala#L18
Is there a reason this could not also be injected into UserActions, and would you accept a PR that does so? (As a side note, it seems the monitoring image could likely accomplish what we want as well, but using one on Terra, or setting any custom workflow options is not allowed as far as I know).
We have historically promoted the idea that task outputs should be pure functions of their inputs, so there is no support for data injection. Such injection would not be captured e.g. for purposes of comparing task identity for call caching.
That makes sense, and I understand the concerns around call caching discussed in the linked issue. If this ENV injection will never be supported is there another recommended method for a workflow to pass information about itself outside cromwell as this seems to be something many people have requested (dating back at least 6 years based on that issue).
Right now, as far as I'm aware, the only option is to poll the REST API which is suboptimal if you're running many workflows at once, and also means that the external service must be authed to either Terra or wherever your standalone cromwell server lives.
It would be very useful for those of us that already have systems for tracking metadata, sample information, etc if cromwell had the ability to notify those systems when results were available somehow. Either through a step in the workflow itself as requested above, or perhaps via webhooks or similar. If not the injection solution above, is anything like that on the roadmap, or is this just not something the team is planning on addressing? Everyone has limited resources and I get that certain things just aren't a priority.
Cromwell Really Should Support pub-sub for workflow status notifications. There is definitely some kind of code in there already today, but it is not used in production and I don't know how complete it is. That said, it should be possible now to use the /query endpoint to get information about multiple workflows at once, such as all currently-running workflows, or all workflows started after X time.
Cool, that's good to know. Appreciate the response, Thanks!