Use of "citation"
The field "citation" in the examples is not necessarily used in the same way as intended by schema.org. It seems that in the examples, it is used as a string how to cite the given dataset. However, on schema.org, it is rather defined as a reference to other creativeWorks. There might be a new field called e.g. "citeAs" that contains the information. The explanation for this field should suggest using some identifier like a DOI. The string (as also used in the examples here) is not very useful, since different journals have different ways to cite datasets (is it e.g. "J. Smith" or "Smith, J." or "Joe Smith" or "Smith, Joe"?). I/we (Polar Data Forum III) suggest to provide a DOI if available, refer to an object with author, title, etc (maybe already given in another part of the metadata), or, as a last resort, give a citation string.
Leyla Garcia from bioschemas.org is reaching out to contacts at schema-org about the definition change
see: https://github.com/schemaorg/schemaorg/issues/2325
https://developers.google.com/search/docs/data-types/dataset used to define this field as "Preferred citation for this dataset", but it has been updated to say, "Identifies academic articles that are recommended by the data provider be cited in addition to the dataset itself. Provide the citation for the dataset itself with other properties, such as name, identifier, creator, and publisher properties."
I think it's safe to begin a pull request to update the guidance document on how to properly use this field.
Following https://github.com/schemaorg/schemaorg/issues/1031
https://github.com/schemaorg/schemaorg/issues/1031 doesn't seem to resolve the question of what schema:citation is supposed to mean. The so:citation scope note "A citation or reference to another creative work, such as another publication, web page, scholarly article, etc." is not very useful, essentially 'a citation is a citation'.
The comment by ljgarcia. Presents two options:
- the CreativeWork is a publication about that Dataset
- the CreativeWork references the Dataset because it makes a claim based on data contained in that Dataset
And this discussion notes a third potential interpretation:
- Preferred citation for this dataset
SOSO should pick one and recommend that. I suggest using "so:citation: a reference to a resource made because the CreativeWork makes a claim based on data contained in that resource" (modified from @ljgarcia second option. ) Given that the issue about this in schema.org issue tracker from 2016 has gone nowhere, I don't think we should count on any updates to schema.org.
@smrgeoinfo thanks for the clarifications, I agree they are important. For some more context, in EML, we have fields for all three of those concepts. They are:
-
usageCitation: used whenever a work uses or incorporates the dataset; this is the 'claim is based on' case, and is a traditional citation in that regard (we would map this to the DataCite
citedByproperty) - referencePublication: used when there is a canonical citation that should be used to represent this dataset; I think this is the 'preferred citation' case you list above
- literatureCited: this is a list of related works that are cited in some way by the dataset (often as part of the background/context of the dataset metadata)
I think it would be good to differentiate at least these three roles of citation references in the so:citation clarifications.
Difficulty: Easy
positives
- will address a recent fix to current guidelines
- provide clarity on its meaning and usefulness as opposed to its previous definition (for describing the Dataset's own citation)
- used by Google Dataset Search tool to link to Google Scholar
negatives
- does not address how specify the type of relationship between the Dataset and the cited CreativeWork
+1 to include in v1.3
As a reference, it looks like Datacite is taking the more specific properties from its schema (References and Cites) and aggregates them into this so:citation property. Not sure if there are other rules applied so maybe @mfenner could describe what their algorithm is?
@mbjones will reach out to Martin Fenner at DataCite about their algorithm
Discussed possibly the ESIP schema.org cluster managing a vocabulary of dataset relations (mirroring DataCite Schema relation types)
re: Garza mention of how Datacite uses schema:citation
https://github.com/ESIPFed/science-on-schema.org/issues/128#issuecomment-888458367
We had discussed possibly using LinkRole to specify relationship of the object of the citation. here's an example, using the DataCite relationship terms in the linkRelationships text value. 'roleName' value is text or URL, so if there is a URI for the relationship that could go there.
{
"@context": "https://schema.org/",
"@type": "Dataset",
"citation": [{
"@type":"CreativeWork",
"url": {
"@type":"LinkRole",
"url":"https://www.example.com/articlethatUsesDataset",
"description":"link to publication that bases scientific conclusions on analysis using the dataset",
"roleName":"https://eml.ecoinformatics.org/whats-new-in-eml-2-2-0.html#usage-citations"
"linkRelationship":"IsCitedBy"
},
{
"@type":"CreativeWork",
"url": {
"@type":"LinkRole",
"url":"https://www.example.com/articlethatCommentsOndataset",
"description":"link to a publication that comments on/discusses the dataset",
"roleName":"https://eml.ecoinformatics.org/whats-new-in-eml-2-2-0.html#referencePublication"
"linkRelationship":"IsReferencedBy"
},
{
"@type":"CreativeWork",
"url": {
"@type":"LinkRole",
"url":"https://www.example.com/articlethatProvidesSupplementalInformation",
"description":"link to a publication that provides additional information useful to understand the dataset, e.g. analytical procedures, scientific context.",
"roleName":"https://...",
"linkRelationship":"Supplements"
}
]
}
Still doesn't solve how to assert a 'recommended citation' text string to use when citing the dataset; perhaps a convention that if schema:citation has a text value (not a CreativeWork) then that is assumed to be the recommended citation string.
Should it be "IsReferencedBy" (w/ a "D" at the end of reference)? Source: Scholix (appendix 3.1 - https://zenodo.org/record/1120265)
On Mon, Sep 20, 2021 at 1:17 PM Stephen Richard @.***> wrote:
We had discussed possibly using LinkRole to specify relationship of the object of the citation. here's an example
{ @.": "https://schema.org/", @.": "Dataset", "citation": [{ @.":"CreativeWork", "url": { @.":"LinkRole", "url":"https://www.example.com/articlethatUsesDataset", "description":"link to publication that bases scientific conclusions on analysis using the dataset", "roleName":"https://eml.ecoinformatics.org/whats-new-in-eml-2-2-0.html#usage-citations" "linkRelationship":"IsCitedBy" },
{ ***@***.***":"CreativeWork","url": { @.***":"LinkRole", "url":"https://www.example.com/articlethatCommentsOndataset", "description":"link to a publication that comments on/discusses the dataset", "roleName":"https://eml.ecoinformatics.org/whats-new-in-eml-2-2-0.html#referencePublication" "linkRelationship":"IsReferenceBy" },
{ ***@***.***":"CreativeWork","url": { @.***":"LinkRole", "url":"https://www.example.com/articlethatProvidesSupplementalInformation", "description":"link to a publication that provides additional information useful to understand the dataset, e.g. analytical procedures, scientific context.", "roleName":"https://...", "linkRelationship":"Supplements" } ] }
Still doesn't solve how to assert a 'recommended citation' text string to use when citing the dataset; perhaps a convention that if schema:citation has a text value (not a CreativeWork) then that is assumed to be the recommended citation string.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ESIPFed/science-on-schema.org/issues/42#issuecomment-923263744, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFUG4RHJ3235USTMU5YYTDUC6JERANCNFSM4JPTN3BA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- Elisha M Wood-Charlson, PhD (she/her) KBase https://kbase.us/ User Engagement Lead; @DOEKBase https://twitter.com/doekbase NMDC http://microbiomedata.org/ @microbiomedata https://twitter.com/MicrobiomeData Lawrence Berkeley National Laboratory LinkedIn http://www.linkedin.com/in/elishawc, Twitter https://twitter.com/ElishaMariePhD (personal)
Did this get resolved? I am trying to do the same thing in RO-Crate - that is, provide a textual citation for a dataset
I have suggested so:creditText for textual citations over at the RO-Crate repo: https://github.com/ResearchObject/ro-crate/issues/265
That's in their "new" area, and it looks like what we need. Good suggestion!
So, is the SOSO recommendation to use the schema.org creditText field to give a citation string for the dataset? We are working to map SPASE, the metadata system in Heliophysics, following the SOSO guidance and this is not clear. The example given is a 1966 paper for a dataset, and since we are moving away from citing a publications and towards citing the dataset across the sciences, this example is not ...helpful. Perhaps a section could be added on this, specifically how to include a string citation for the dataset and how to include information for closely related publications (e.g. documentation or a publication about the dataset that have DOIs)?