ORCID-Source icon indicating copy to clipboard operation
ORCID-Source copied to clipboard

JSON-LD PropertyValue should represent ROR value as a string not a URL

Open rdmpage opened this issue 3 years ago • 8 comments

Following on from https://github.com/ORCID/ORCID-Source/issues/6519 @dshorthouse gave the example of @TomDemeranville ORCID record. The values for ROR are given as URLs (e.g., https://ror.org/04fa4r544) whereas surely they would be better as strings without the URL (i.e. 04fa4r544)? This would make them consistent with the RINGGOLD identifier below, and also how ORCID handles DOIs.

AFAIK the whole point of using PropertyValue for an identifier is that we can present just the identifier part (the "slug", which tends to be persistent) and ignore the resolution component (which tends to be changeable).

        {
            "@type": "Organization",
            "name": "ORCID",
            "alternateName": "Product",
            "identifier": {
                "@type": "PropertyValue",
                "propertyID": "ROR",
                "value": "https://ror.org/04fa4r544"
            }
        },
        {
            "@type": "Organization",
            "name": "ORCID",
            "alternateName": "Technology",
            "identifier": {
                "@type": "PropertyValue",
                "propertyID": "ROR",
                "value": "https://ror.org/04fa4r544"
            }
        },
        {
            "@type": "Organization",
            "name": "Eduserv Athens",
            "alternateName": "Athens",
            "identifier": {
                "@type": "PropertyValue",
                "propertyID": "RINGGOLD",
                "value": "226102"
            }
        },

rdmpage avatar Apr 28 '22 12:04 rdmpage

Hi @rdmpage,

Thanks for your questions, according to the ROR display guidelines https://ror.org/display-guidelines/ the ROR should be represented by the full URL, that is why we display it like that.

Thanks

amontenegro avatar Apr 29 '22 20:04 amontenegro

Hi @amontenegro, I think this conflates two separate issues: how to store an identifier as a PropertyValue, and how to display it online.

The DOI foundation makes a similar recommendation as ROR w.r.t. DOIs (i.e., display a DOI with the https://doi.org/ prefix) but that doesn't mean the value in the PropertyValue should be a URL. Indeed, you currently display DOIs as just the DOI in a PropertyValue, which I think is correct. This also makes it easier to query across multiple sources (we don't have to assume that everyone uses the same prefix to resolve DOIs, and we a robust to changes in resolution, such as the switch from http to https).

How you chose to display an identifier on the ORCID web page is a separate question, and in that case yes, it's good to follow recommendations. Likewise if you were using ROR as the @id for an organisation then the https URL makes sense. But for the value field I think you just want the identifier.

So, can I ask that you reconsider this?

rdmpage avatar Apr 29 '22 21:04 rdmpage

Hi @rdmpage, I'm the technical lead at ROR. ROR IDs are full URLs, whether used in display or metadata. In contrast to some other organization IDs, the identifier value in each ROR record is a full URL, ex:

{
   "id":"https://ror.org/013cjyk83",
   "name":"PSL Research University",
   "email_address":null,
   ...

Examples showing ROR IDs used in DOI metadata (alongside other IDs whose canonical form is not necessarily a URL) are here https://ror.readme.io/docs/include-ror-ids-in-doi-metadata

lizkrznarich avatar May 11 '22 11:05 lizkrznarich

Hi @lizkrznarich,

I understand, in the same way that DOIs are currently treated as URLs whereas in the past just the 10.nnn/xxxxx was regarded as the DOI.

However it's often handy to separate out the resolution aspect from the identifier, and the schema.org PropertyValue gives us a way to do that, as well as say what the identifier is (e.g., ROR, ORCID, DOI) without having to parse a URL.

I've been around long enough to see DOIs go from info:do to DOI: to http://dx.doi.org to http://doi.org to https://doi.org. Any database storing DOIs that stored the complete URL is going to have fun trying to match DOIs given these changes. Resolvers have a habit of changing (even if it's just http to https). It may well be that ROR thinks it is going to always have https://ror.org/ as part of its identifier, but what if https goes the way of http? What if ROR lose the ror.org domain?

In the context of RDF I agree that if we have a record where the primary identifier is the ROR id then it makes sense to use the URL. If ROR ever serve JSON+LD or embed it in their web pages (they should) then

{
"@id": "https://ror.org/013cjyk83",
"name" : "PSL Research University"
}

makes perfect sense. I'm not arguing against that, I'm simply suggesting that there is a convention when using PropertyValue to store the identifier stripped of any resolution-specific prefix. We see this with DOIs in ORCID, I just think it keeps things simple for consumers if ROR identifiers would be treated in the same way. IF ROR is used in an "@id", or "sameAs" it should be the full URL, if it is a PropertyValue.value then as a string without the resolver.

rdmpage avatar May 11 '22 12:05 rdmpage

While I agree that we certainly have seen cases where the "canonical" protocol, domain or path portions of a PID URL have changed over time, on the flip side, since there is no authoritative list of identifier prefixes, an ID stripped of those URL components can become essentially useless (or a least annoying) for a consumer. For example, ORCID invested a significant amount of effort in sorting out resolution for the list of work IDs it supports. While URL changes can be easily made backward compatible (as is the case for both DOIs and ORCID iDs) via 301 redirects, which also serve the purpose of communicating the change to the requestor, it's more difficult for a user (particularly a machine "user") to determine how to resolve a given ID that does not include URL components.

In my years at ORCID as well as DataCite and ROR, a lesson learned over time is that various systems and metadata schemas will convey PIDs differently (as we see with ORCID in https://orcid.org/0000-0001-6622-4910 vs orcid.org/0000-0001-6622-4910 vs 0000-0001-6622-4910 vs 0000000166224910). This hinders interoperability, making life difficult for those consuming metadata from multiple sources. ROR's decision to makes its ID values full URLs and to recommend that others convey ROR IDs in this format is a deliberate effort to overcome some of these interoperability challenges we've seen in the past.

lizkrznarich avatar May 11 '22 13:05 lizkrznarich

I think we're talking past each other a little. I not arguing against the full URL as an identifier, just how it's used in one particular context. I think of PropertyValue as a bit like the list of identifiers in Wikidata. For example, here is PSL University:

Screenshot 2022-05-11 at 14 36 41

The ROR is given as the "slug" after https://ror.org/, and if you mouse over you'll see the full URL. I imagine PropertyValue to be analogous to this, a list of identifiers where each is "typed" and shown as a string if people want to match on that.

Anyway, it's a small hill I've chosen to die on (there are so many hills), and I suspect we've both got better things to do. Thanks for your patience with my pedantry.

rdmpage avatar May 11 '22 13:05 rdmpage

I get the sense that we're talking cross-purposes here. I see two ways where identifiers are being expressed in JSON-LD and these serve two different purposes. Software that consumes these two expressions (may) have expectations on their structure:

Scenario 1:

{
  "@id": "https://ror.org/013cjyk83",
  "name" : "PSL Research University"
}

URL as the @id. Good. We're all on-board here.

Scenario 2:

As presently produced:

{
  "@type": "Organization",
  "name": "ORCID",
  "alternateName": "Technology",
  "identifier": {
    "@type": "PropertyValue",
    "propertyID": "ROR",
    "value": "https://ror.org/04fa4r544"
  }
}

Alternative to the above (comparable to what's already done with DOIs):

{
  "@type": "Organization",
  "name": "ORCID",
  "alternateName": "Technology",
  "identifier": {
    "@type": "PropertyValue",
    "propertyID": "ROR",
    "value": "04fa4r544"
  }
}

The question is, what was the point of having "propertyID": "ROR" (or indeed the whole identifier block) if the "value" is already expressed as a resolvable URL?

Would this be permissible/useful/valuable to accommodate the two scenarios:

{
  "@type": "Organization",
  "@id": "https://ror.org/04fa4r544",
  "name": "ORCID",
  "alternateName": "Technology",
  "identifier": {
    "@type": "PropertyValue",
    "propertyID": "ROR",
    "value": "04fa4r544"
  }
}

dshorthouse avatar May 11 '22 13:05 dshorthouse

Argh @dshorthouse I was going to let this go :wink:

Yes, this is pretty much exactly what I'm arguing for:

{
  "@type": "Organization",
  "@id": "https://ror.org/04fa4r544",
  "name": "ORCID",
  "alternateName": "Technology",
  "identifier": {
    "@type": "PropertyValue",
    "propertyID": "ROR",
    "value": "04fa4r544"
  }
}

Note that @id doesn't apply for RINGOLD as (I gather) they don't have resolvable identifiers. In RDF-speak, a record for RINGGOLD is a bnode (bank node).

You could even have

{
  "@type": "Organization",
  "@id": "https://ror.org/04fa4r544",
  "name": "ORCID",
  "alternateName": "Technology"
}

if you want to avoid PropertyValue altogether. It's just that to me value should be a string not a URL. If you only ever want your identifier as a URL then don't use PropertyValue, if you're OK with having a string representation as well, then use PropertyValue as well as @id, if you don't have a URL identifier, then you're basically stuck with PropertyValue.

rdmpage avatar May 11 '22 14:05 rdmpage

We're going to defer to @lizkrznarich at ROR here. But will revisit this if/when we update the way we deal with schema.org data.

TomDemeranville avatar Nov 28 '22 17:11 TomDemeranville