schemaorg icon indicating copy to clipboard operation
schemaorg copied to clipboard

Proposal: New `identifier` subproperty: DOI (Digital Object Identifier)

Open TobiasNx opened this issue 10 months ago • 14 comments

I saw that schema.org does not have DOI as property.

You surely know that DOIs are in context of academic, professional, and government publications and research important persistent identifiers that should enable DOI resolvers to link to the identified object. DOIs are also important for libraries and publishers.

Therefore I suggest DOI as a new subproperty of identitifer.

TobiasNx avatar Feb 18 '25 13:02 TobiasNx

Hi, thanks for contributing. I think this is a duplicate of https://github.com/schemaorg/schemaorg/issues/1286

rlzijdeman avatar Feb 20 '25 21:02 rlzijdeman

@TobiasNx I agree with you, we must have DOI as subproperty of identitifer.

Over the reasons @TobiasNx mentioned, DOI is under the standard ISO 26324 and IANA https://www.iana.org/assignments/urn-formal/doi. So, as we have ISBN for a book, I don't see any reason why we should not have DOI (or doi) for a Thing.

A DOI can identify virtually any digital object and provides a persistent, actionable link to it, whereas an ISBN specifically identifies book publications and their various editions for the book industry. Therefore, DOI has a much broader scope than ISBN.

@rlzijdeman while DOI is mentioned on #1286, this issue address only 1 specific problem whereas #1286 is discussing a more broad problem, mixing other identifiers like PDB or ORCID.

andreadavanzo avatar Apr 01 '25 22:04 andreadavanzo

So, as we have ISBN for a book, I don't see any reason why we should not have DOI (or doi) for a Thing.

A DOI can identify virtually any digital object

It seems like overkill to put doi on Thing as most things are not digital objects. Most (all?) digital objects are CreativeWorks so I would put it there.

philbarker avatar Apr 02 '25 10:04 philbarker

So, as we have ISBN for a book, I don't see any reason why we should not have DOI (or doi) for a Thing.

A DOI can identify virtually any digital object

It seems like overkill to put doi on Thing as most things are not digital objects. Most (all?) digital objects are CreativeWorks so I would put it there.

Good point @philbarker . Just to clarify that we are suggesting to add DOI as subproperty of identitifer, (https://schema.org/identifier) not as property of CreativeWorks. If you give me a chance I can prepare the equivalent of https://schema.org/isbn as https://schema.org/doi

andreadavanzo avatar Apr 02 '25 11:04 andreadavanzo

Just to clarify that we are suggesting to add DOI as subproperty of identitifer, (https://schema.org/identifier)

Understood, but

not as property of CreativeWorks.

a subproperty of identfier can be a property of CreativeWorks (i.e. have CreativeWorks included in its domain, just as isbn is a subproperty of identifier has Books as its domain).

philbarker avatar Apr 02 '25 11:04 philbarker

Ah ok, so yes, CreativeWorks is the best candidate.

andreadavanzo avatar Apr 02 '25 11:04 andreadavanzo

hi,

thanks for bringing the discussion further. I am not against https://schema.org/doi.

The reason I tried to refer to #1286 is that I think it is a better problem statement than this issue @andreadavanzo. DOI is only one way of identifying.

The first three types of creative works mentioned in the description of https://schema.org/CreativeWork relate to the field of media, heritage and humanities, where handle is being used since 1994, well before the creation of was used long before the existence of DOI. How ironic would it be, if in addition to ISBN, there would be DOI, but not Handle? Also, there are Arks, perhaps less popularised, but there still exist more of them than DOI's.

So what I take from #1286, is that yes, we want A way to link persistent CURIEs to CreativeWorks, but not in a way that you say the persistent id for a creative work is DOI. For books we have ISBN (and as far as I know nothing else). For CreativeWorks, we have at least DOI, Handle and Ark, but there might be others.

So my take is to either: (1) do nothing. https://schema.org/identifier is already mentioned as property for https://schema.org/CreativeWork allowing for urls as values including DOI/Handle/Ark/Whatever; or (2) follow up on the suggested solution presented here (https://schema.org/doi, but then also create https://schema.org/hdl and schema.org/ark )as these are more commonly used for things that are of type https://schema.org/CreativeWork).

Given the philosophy of Schema.org to not over specify things, I suggest (1) and see as schema.org/isbn as an understandable exception (there is only 1 id for books, and the concept of a book is quite homogenous unlike the concept of a CreativeWork).

rlzijdeman avatar Apr 03 '25 06:04 rlzijdeman

As a contributor to the original #1286 issue I think it might be useful to comment here.

As can be seen from examples in the discussion it is already possible to define an identifier for a Thing being specifically a doi:

"identifier": {
    "@type": "PropertyValue",
    "propertyID": "DOI",
    "value": "10.1000/182"
},

This being additional to the ability to just use a url (eg. https://doi.org/10.1000/182) as per @rlzijdeman's first option.

Therefore there is not a need for the introduction of a new identifier subtype specifically for doi, or other identifier type. That is not to diminish the argument for such expressed by @TobiasNx and others.

However, similar arguments for other domain specific identifier subtypes have be put forward, which if adopted would have lead to a proliferation of such property types. I note that @rlzijdeman's second suggestion suggests that if we introduce https://schema.org/doi, we should also consider https://schema.org/hdl and https://schema.org/ark.

In an ideal world, a generic ability to define any identifier type and value would have been introduced in the earliest days of Schema.org. In such a scenario there would have been no need to create specific individual properties such as https://schema.org/isbn or https://schema.org/duns.

However we are where we are and backwards compatibility means that those (in retrospect historical anomalies) will remain; and no doubt from time to time stimulate proposals such as this.

My take is to concur with @rlzijdeman's first suggestion - do nothing.

Except perhaps enhancing documentation and examples to clarify the two options for defining identifiers of any type: 1) Just use the URL if there is one available. 2)Use a PropertyValue to describe the identifier type and value

RichardWallis avatar Apr 03 '25 09:04 RichardWallis

Ok. Would not better to add a note on https://schema.org/identifier in the Sub-properties section saying. "Sorry we no longer take sub-proprties on board. If you want to use any new identifier use PropertyValue"? Moreover shoul be DOI or doi?

We do not currently have a recommended identifier scheme for identifier schemes, but in most cases there is a conventional short name for most identifier schemes (which should be used in lowercase form).

andreadavanzo avatar Apr 06 '25 06:04 andreadavanzo

I support the PropertyValue pattern and we use schema:identifier with many - a couple of dozen - identifier types in lots of data in this way.

Rather than using an unconstrained string as the propertyID, we always use an identifier URI which we base on the codes from the Library of Congress' Standard Identifiers which contains DOI, ISBN, ROR but not ORCID or DUNS.

For example, a business with the URI http://ecample.com/busines/x with an Australian Business Number of 123 456 789, we would represent as:

{
  "@context": {
    "schema": "https://schema.org/",
    "ids": "http://id.loc.gov/vocabulary/identifiers/"
  },
  "@id": "http://example.com/business/x",
  "@type": "schema:Organization",
  "schema:identifier": {
    "@type": "ids:ausbn",
    "@value": "123 456 789"
  },
  "schema:name": "Business X"
}

In the Turtle syntax:

PREFIX ids: <http://id.loc.gov/vocabulary/identifiers/>
PREFIX schema: <https://schema.org/>

<http://example.com/business/x>
    a schema:Organization ;
    schema:identifier "123 456 789"^^ids:ausbn ;
    schema:name "Business X" ;
.

We find this a pretty complete solution as we can just register any other identifier types we like in our own extension of the LoiC vocabulary as needed, but the main ones are all there.

nicholascar avatar Jun 13 '25 01:06 nicholascar

The ESIP Science on Schema.org recommendations also use a PropertyValue pattern to specify different kinds of identifiers, see https://github.com/ESIPFed/science-on-schema.org/blob/main/guides/Dataset.md#identifier

e.g.

"identifier":
      {
        "@id": "https://doi.org/10.5066/F7VX0DMQ",
        "@type": "PropertyValue",
        "propertyID": "https://registry.identifiers.org/registry/doi",
        "value": "doi:10.5066/F7VX0DMQ",
        "url": "https://doi.org/10.5066/F7VX0DMQ"
      }

smrgeoinfo avatar Jun 13 '25 18:06 smrgeoinfo

The ocean info hub is using the same pattern: httpl#source-and-prov-approachess://book.odis.org/thematics/identifier/id.htm

smrgeoinfo avatar Jun 13 '25 18:06 smrgeoinfo

At the risk of oversimplifying things, it would also be possible to state the DOI with a sameAs attribute value pair.

Since the DOI is a persistent identifier expressed as an IRI, it's a proper candidate for this kind of simplified modelling.

fjjulien avatar Jul 21 '25 21:07 fjjulien

This issue is being nudged due to inactivity.

github-actions[bot] avatar Oct 31 '25 02:10 github-actions[bot]