omero-cli-zarr icon indicating copy to clipboard operation
omero-cli-zarr copied to clipboard

Standardize _creator field

Open joshmoore opened this issue 5 years ago • 3 comments

Currently,

(z) /opt/omero-ms-zarr $cat 101.zarr/.zattrs
{
    "_creator": {
        "name": "omero-zarr",
        "version": "0.0.2.dev79+gb361c09"
    },

is added on export. We may want to slightly update this to match with a vocabulary like Dublin Core or W3C PROV.

joshmoore avatar Nov 13 '20 09:11 joshmoore

Another candidate vocabulary would be SoftwareApplication. This is also the vocabulary suggested in https://www.researchobject.org/ro-crate/1.0/#provenance-software-used-to-create-files.

The example above could be translated into:

          "@context": "https://schema.org",
          "@type": "SoftwareApplication",
          "name": "omero-cli-zarr",
          "version": "0.0.2.dev79+gb361c09"

Trying also to include the discussion around additional software information in https://github.com/ome/omero-cli-zarr/pull/76#discussion_r691978664, softwareAddon would be an option

          "@context": "https://schema.org",
          "@type": "SoftwareApplication",
          "name": "omero-cli-zarr",
          "version": "0.0.2.dev79+gb361c09",
          "softwareAddOn": {
               "@type": "SoftwareApplication",
               "name": "bioformat2raw",
               "version": "0.3.0",
          },

sbesson avatar Aug 19 '21 15:08 sbesson

Generally looks interesting, but we'll need to figure out where it's attached. Only at the top level? (Do we have a standard structure there?) For each multiscale in case they are generated by different software. etc.

joshmoore avatar Aug 20 '21 13:08 joshmoore

https://github.com/ome/omero-cli-zarr/issues/48#issuecomment-902003430 is a use case where there is a one-to-one mapping between the software and the specification i.e.

multiscales -> bioformats2raw
omero -> omero-cli-zarr

So although it could be at the top-level, there is a case for defining it (or including a reference via @id) at the level of each specification. This is what the multiscales specification currently attempts to do via metadata. Maybe we want to generalize this to allow all specifications to inject provenance metadata in a metadata field?

For more granular provenance i.e. each dataset being generated by different software, maybe we want to allow metadata fields to be defined further down the path e.g.

{
   "multiscales":[
      {
         "version":"0.2",
         "name":"example",
         "datasets":[
            {
               "path":"0",
               "metadata":{
                  "@context":"https://schema.org",
                  "@type":"SoftwareApplication",
                  "name":"bioformat2raw",
                  "version":"0.3.0"
               }
            },
            {
               "path":"1",
               "metadata":{
                  "@context":"https://schema.org",
                  "@type":"SoftwareApplication",
                  "name":"mydownsampler",
                  "version":"0.1.0"
               }
            }
         ]
      }
   ]
}

sbesson avatar Aug 20 '21 13:08 sbesson