ro-crate icon indicating copy to clipboard operation
ro-crate copied to clipboard

Identifiers linking ro-crates inside a repository

Open marcolarosa opened this issue 4 years ago • 1 comments
trafficstars

From a conversation with the UTS folks including @ptsefton, Mike Lynch and others

Say I create two ro-crates on my desktop: A and B. A is a collection with identifier "A" and B is an item with identifier "A/B" (this naming scheme is from PARADISEC). B is an item within the collection A.

When I create these crates I can stamp the identifiers as "/paradisec.org.au/A" and "/paradisec.org.au/A/B" (the domain name prefix will be explained later) and I can link the item (B) in the collection (A) by using "hasMember" and the collection (A) to the item (B) via "memberOf". Here's the two crates (cut down for clarity).

{ 
  @graph: [{
    @type: 'Dataset',
    @id: '/paradisec.org.au/A',
    hasMember: [{ @id: '/paradisec.org.au/A/B' }]
  }]
} 

{ 
  @graph: [{
    @type: 'Dataset',
    @id: '/paradisec.org.au/A/B',
    memberOf: [{ @id: '/paradisec.org.au/A' }]
  }]
} 

At the time of creation I know that they need to be linked but I don't know exactly where they will be hosted so I can't use a URI to link them: e.g. https://{some.place.com}/paradisec.org.au/A - some.place.com is unknown at the moment of creation.

Although not relevant in the ro-crate layer, stamping the ID's with a domain prefix is akin to how DOI's are minted. Provided a user creating the crate ensures their id's are unique within their namespace, their crates can live in repositories with crates from other communities without resulting in ID clashes. This is essential when hosting on an OCFL filesystem as the ID is used to create the path for the OCFL object.

This issue is about considering this problem. What is the best way to define these references between crates?

Option 1: continue as defined above

The current PARADISEC RO-crate / OCFL demonstrator works with links in this way. When the portal encounters one of these hasMember / memberOf id's it knows how to transform that into a local OCFL path and load the object. So, it assumes that it's relative to the current host even though they're not valid id's and they imply a path within the crate rather than a link to some other crate in the same hosting service.

However, by doing this I can move those two crates to another system and the interlinking will continue to work provided that the target system makes the same assumption.

Option 2: rewrite ID's on the way into / out of the repository

A solution could be to rewrite these id's as the crates get ingested into a repo. They could be rewritten to the name of the repo so that /paradisec.org.au/A becomes https://{some.host.com}/paradisec.org.au/A. But rewriting can fail and it also needs to be done as the crates move around repositories during their lifetime.

Option 3: invent a scheme - use ARCP - but it needs work, maybe...

The UTS folks went down the path of something like _:local-id:ocfl-object:/paradisec.org.au/A which is valid JSON-LD, and identifies that it's a local identifier in an ocfl object. However, it's not a standard so we talked about using ARCP identifiers. The one that stood out was arcp://name,paradisec.org.au/A but this has some problems:

  • it still looks like a file ref
  • it suggests a local repository url but doesn't make it explicit

So now we come to inventing things...

Ideally, I would encode that fact that this reference is a URL that is to be resolved in the current host in a way that is unambiguous. So, if we mash the UTS approach with ARCP we might birth something like:

arcp://host:local,url:/paradisec.org.au/A

Requirements of any solution

Guide rails are always good to have.... :-)

  • Encodes in link that it should be resolved within the hosting system
  • Doesn't require prior knowledge of the hosting system when creating
  • Doesn't involve rewriting into / out of the hosting system as the object moves around during its lifetime
  • Doesn't hash the bits so they're incomprehensible - otherwise we'll need some other property in the crate that allows the user to identify the id as 'A', '/A/B'

marcolarosa avatar Jun 03 '21 23:06 marcolarosa

https://s11.no/2018/arcp.html#uuid-based

stain avatar Jun 10 '21 09:06 stain