cwlprov cwlprov:relationship sketch

Together with #1 this attempts to find a way to pre-define domain-specific provenance that would be generated at workflow run time. The idea is define a set of relationships that will be added onto the produced outputs of a step to relate it to other data values or concepts at creation time.

These can use domain-specific ontologies like EDAM ontology or BioSchemas, or more generic ones likes PROV or schema.org

#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: Workflow

inputs:
  first_input: File
  second_input: long

steps: []

outputs:
  first_output:
    type: File
    outputSource: first_input
    cwlprov:relationships:
       prov:wasDerivedFrom: [ '#inputs.second_input' ]
       prov:wasInfluencedBy: [ '#inputs.second_output' ]

$namespaces:
  prov: http://www.w3.org/ns/prov#
  cwlprov: https://w3id.org/cwl/prov#

$schemas:
  - http://www.w3.org/ns/prov.owl

Oct 02 '18 14:10 mr-c

As this is a relationship to be generated between values of first_output and second_output, I think some kind of template or expression?

JSON-LD with $expansions

cwlprov:relationship:
  { "@id": "$second_output",
    "prov:wasDerivedFrom": "$first_output" }

Or if we assume the current port is the subject and you can't do arbitrary structures you can just have property-object references (no literals in this case):

cwlprov:relationship: {
    "prov:wasDerivedFrom": "$first_output",
    "example:foo": "edam:topic_0091",
  }

Namespaces like prov and edam here must be defined in CWL $namespaces. The template is expanded based on identifiers for the produced values (e.g. urn:uuid:8c97eb7a-94d8-40bf-a932-7e888445f2ec).

If we have:

{ "first_output": { 
    "@id": "urn:uuid:a1626deb-a5a8-4b84-803e-8dd51f80bf2d"
  },
  "second_output": {
    "@id": "urn:uuid:6e076c8b-d3fe-47f0-844b-b0e1561d3181"
  }
}

Then with expansion of namespaces and $variables we get:

{ "first_output": { 
    "@id": "urn:uuid:a1626deb-a5a8-4b84-803e-8dd51f80bf2d"
  },
  "second_output": {
    "@id": "urn:uuid:6e076c8b-d3fe-47f0-844b-b0e1561d3181",
    "http://www.w3.org/ns/prov#wasDerivedFrom":  {
      "@id": "urn:uuid:a1626deb-a5a8-4b84-803e-8dd51f80bf2d"
     },
    "http://example.com/foo":  {
      "@id": "http://edamontology.org/topic_0091"
    }
  }
}

[ updated by @mr-c to add missing commas, make the UUIDs unique ]

Oct 02 '18 14:10 stain

@stain Thank you for the json-ld example.

I've updated my sketch to show that we might want to set relationships between an output and another output and also an input

Oct 03 '18 08:10 mr-c

OK, in 036af7c78a3e1c5125009ae05dbdb853afca6790 I try to sketch out how this can be recorded as templates in the CWL, and then add these to the PROV. There is an issue in what to call these (here cwlprov:relationships and how to reference the variables to fill in at execution time (here using a direct reference #inputs.first_input).

But this leads to fairly misleading information in cwlprov --print-rdf in that it would claim the output parameter definition has a "relationship" to an anonymous object, which then "is derived from" (or whatever property is used) an input parameter definition. This is acceptable if we think of the input/object parameter as a "superobject" of every object that passes through it, as in every file object prov:specializationOf the parameters it is input or output at.

(this is like saying Stian is a specialisation of CustomerOfTesco because I went shopping at Tesco once)

See also PROV-Template which would use a special var namespace for pre-existing variables, which we could bind directly to the input/output objects using existing CWL Expressions (e.g. $(inputs.message) -> var:inputs.message)

May 23 '19 12:05 stain

Here are some of the mappings we should be able to do https://gist.github.com/stain/f0b0d966a103b1533d684aa6d7197364

The data concepts are often more complex expressions than pure typing from EDAM ontology or BioSchemas - so it might be we need to support more than 1 triple-level expressions as explored here and in #1.

May 23 '19 12:05 stain

cwlprov cwlprov copied to clipboard

cwlprov:relationship sketch

cwlprov
cwlprov copied to clipboard