workflow-execution-service-schemas icon indicating copy to clipboard operation
workflow-execution-service-schemas copied to clipboard

Suggest that implementors use HTTP Link header field to indicate provenance when retrieving results

Open mr-c opened this issue 8 years ago • 12 comments

https://www.w3.org/TR/prov-aq/#resource-accessed-by-http

Idea is from @stain

mr-c avatar Nov 14 '17 09:11 mr-c

This is a clever usage of the feature that helps with the last leg of a workflow execution. I think it will require modification directly of the OpenAPI description.

david4096 avatar Feb 09 '18 17:02 david4096

I'm pro-provenance in general, but would need some more details on what this might look like in the spec. Tagging as a v2.0 candidate for now.

jaeddy avatar May 06 '19 03:05 jaeddy

To be more specific, the returned value would be the IRI/URI to a v0.6.0 or newer CWLProv ResearchObject https://w3id.org/cwl/prov/0.6.0

mr-c avatar May 06 '19 09:05 mr-c

@mr-c to clarify for my sake -- your suggestion enables something like:

  1. I'm doing a status check on a workflow
  2. I see the status is a failure
  3. Linked to the status through "provenance-uri" is the actual http link to the raw workflow definition associated to the failure -- which I can go to investigate?
  4. ....

ruchim avatar Nov 19 '20 04:11 ruchim

@ruchim Almost, for step 3 the URI points to a CWLProv document that would give detailed information, not the workflow definition (which one assumes the caller of the API already has a copy of)

mr-c avatar Nov 19 '20 18:11 mr-c

ahhh! is the provenance-uri the same as stdout/stderr logs -- or something else? also, thanks so much for the quick response, really appreciate it.

ruchim avatar Nov 19 '20 20:11 ruchim

The provinance-uri would point to a CWLProv format document which would contain structured data including raw logs, server information, etc.. 🙂👍

mr-c avatar Nov 19 '20 21:11 mr-c

I think from https://www.w3.org/TR/prov-aq/#resource-accessed-by-http in PROV we kind of allowed any kind of provenance document, although one containing PROV in one of the several formats would be preferable.

I would not require all in CWLProv - that is kind of the inside of the workflow and could be exposed as well if present as a Research Object BagIt archive (as it would be multiple files) or as a directory of files exposed through the WES - there is no single "CWLProv document" as such, we have both primary.cwlprov.* in multiple serializations, or metadata/manifest.json that types and links to all the other files.

Perhaps WES would have its own "outer" provenance that just says when the workflow job started/stopped and ideally links to its outputs?

stain avatar Nov 20 '20 16:11 stain

Cool, did some reading on the links to catch up -- and thanks to Jeff Gentry for explaining CWLProv a little more deeply. My own thoughts are that these are really good best practices from the perspective of leveraging features of the http spec. If I put myself in the mindset of someone who runs workflows a lot, I'd absolutely need logs of my workflow run and a link to that log (whether it looks like a provenance object or not) and I'd expect a link to those logs directly in the API response (not just header, which I may never even know to check as comp bio rather than software engineer). So I see the logs as a necessity in the spec and the provenance_uri a bonus/competitive feature for providing structured details for debugging/tracking.

So it sounds like think this is something to add to WES documentation as a recommendation rather than a spec change. let me know if I misunderstood!

ruchim avatar Nov 30 '20 22:11 ruchim

just poking @stain @mr-c @david4096 for any opinions to my comment above!

ruchim avatar Dec 08 '20 03:12 ruchim

@ruchim Yep, it can go in the header and in the body of the response, agreed!

mr-c avatar Dec 31 '20 15:12 mr-c

excellent, I'll mark this as a documentation change.

ruchim avatar Jan 04 '21 19:01 ruchim