ro-crate icon indicating copy to clipboard operation
ro-crate copied to clipboard

Use Case: how to reference data entities within a referenced RO-Crate

Open elichad opened this issue 1 year ago • 2 comments
trafficstars

As a RO-Crate creator, I want to reference a file within another RO-Crate (which may be a contextual or data entity) so that I do not need to duplicate it into my own crate/so that consumers of my crate can find the file I reference.

elichad avatar Jul 19 '24 10:07 elichad

More specifically: I am building a Workflow Run RO-Crate where the mainEntity workflow is a separate Workflow RO-Crate which is already published on WorkflowHub.

That workflow crate contains an example dataset which I want to reference in the workflow run crate (because in this case we can't include the actual inputs in the crate, so the example dataset acts as a reference for the format used). However, it's not clear how to identify this dataset, as it's a data entity with a local path in the other crate.

elichad avatar Jul 19 '24 11:07 elichad

Data entities proposal

Using arcp it is probably the way to go here. The issue is how/where is most sensible to provide the arcp URI in the metadata. Here's an example:

...
{
  "@id": "https://workflowhub.eu/workflows/000/",
  "@type": [
    "Dataset",
    "ComputationalWorkflow"
  ],
  "conformsTo": [
    {
      "@id": "https://w3id.org/ro/crate"
    },
    {
      "@id": "https://w3id.org/ro/wfrun/process"
    }
  ],
  "url": "https://workflowhub.eu/workflows/000/",
  "distribution": {
    "@id": "https://workflowhub.eu/workflows/000/ro_crate"
  },
},
{
  "@id": "https://workflowhub.eu/workflows/000/ro_crate",
  "@type": "DataDownload",
  "encodingFormat": [
    "application/zip"
  ],
  "conformsTo": {
    "@id": "https://w3id.org/ro/crate"
  },
  "identifier": [
    "https://workflowhub.eu/workflows/000/ro_crate",
    "arcp://uuid,b89b5d50-3146-4600-b8b8-6dafc332e56e/",
  ]
},
{
  "@id": "arcp://uuid,b89b5d50-3146-4600-b8b8-6dafc332e56e/data.csv",
  "@type": "File",
  "name": "Data file from external crate.",
  "encodingFormat": "CSV",
},
...

In this case the base arcp URI is included as an identifier on a DataDownload which provides a zip of the external crate (which is in turn referenced as a distribution on a more general contextual entity representing the external crate). I can then use arcp URIs to reference data entities within the external crate.

Welcome suggestions on where else the base arcp URI can be defined, this is just one suggestion for standardization

elichad avatar Jul 24 '24 16:07 elichad