ro-crate
ro-crate copied to clipboard
Use Case: Describe a collection of highly related files
As a researcher, I want to be able to describe a set of related files so that the metadata file does not contain redundant descriptions.
Use Case
Here is a simple example dataset from a MERSCOPE microscope:
$ ls -1 region_R1/images
manifest.json
micron_to_mosaic_pixel_transform.csv
mosaic_DAPI_z0.tif
mosaic_DAPI_z1.tif
mosaic_DAPI_z2.tif
mosaic_DAPI_z3.tif
mosaic_DAPI_z4.tif
mosaic_DAPI_z5.tif
mosaic_DAPI_z6.tif
mosaic_PolyT_z0.tif
mosaic_PolyT_z1.tif
mosaic_PolyT_z2.tif
mosaic_PolyT_z3.tif
mosaic_PolyT_z4.tif
mosaic_PolyT_z5.tif
mosaic_PolyT_z6.tif
According to the user guide:
The images are single channel, single plane, 16-bit grayscale tiff files, with the naming convention
mosaic_{stain name}_z{ZIndex}.tif
Now, I could describe every single file here, which would end up with 14 (but in real life, many more) almost identical entities:
[
{
"@id": "mosaic_DAPI_z0.tif",
"@type": "File",
"encodingFormat": "image/tiff",
"description": "Mosiac tiff capturing the 0th Z-slice for the DAPI stain."
},
{
"@id": "mosaic_DAPI_z1.tif",
"@type": "File",
"encodingFormat": "image/tiff",
"description": "Mosiac tiff capturing the 1st Z-slice for the DAPI stain."
},
...
]
I also don't like the idea of describing these only as part of the description of the parent Dataset, because then I miss all of the image-specific properties, I lose the ability to run queries like "find all TIFF files", and the Dataset description would become exceedingly long.
Suggestion
One suggestion I have is to allow us to use glob-style patterns to describe sets of files.
One way this might work is simply by allowing an ID which is a glob. For example:
{
"@id": "mosaic_DAPI_z*.tif",
"@type": "File",
"encodingFormat": "image/tiff",
"description": "Mosiac tiff capturing a singular Z-slice for the DAPI stain."
}
The only downside of this is that * is an unusual character in an ID, but it is technically legal in an IRI according to RFC 3987.
Alternatively, we could create a new property called pattern (I'm sure we could find an IRI for it that corresponds to practical usage), which is a glob pattern that selects a set of files. Then we can attach that to a Dataset to capture a subset of files. Then we assume that any property on the Dataset describes any given file within that dataset. For example:
{
"@id": "#mosaic-dapi",
"@type": "Dataset",
"encodingFormat": "image/tiff",
"pattern": "mosaic_DAPI_z*.tif",
"description": "Mosiac tiff capturing a singular Z-slice for the DAPI stain."
}
I like this less, because it's a bit odd and ugly to attach File properties to a Dataset.