variableMeasured: what Classes should be in rangeIncludes?
Currently schema.org rangeIncludes for variableMeasured is PropertyValue (useful) or Text (lots of flexibility, not machine interoperable). Issue #27 recommendation draft currently supports use of PropertyValue. An alternate approach is to expand the rangeIncludes for variableMeasured to include other Classes like schema:Person, schema:Observation, schema:Place, schema:Event, Schema:Date.
Which do we want to recommend. Please add specific dataset examples and schema.org encoding examples to present alternate approaches in the GitHub branch for the variableMeasured discussion.
Putting aside the language difficulties with 'variableMeasured'-- it's used here to mean any value contained in a dataset, whether it's a simple numeric measured value, an asserted categorization, an identifier for the data item, a reference to an item somewhere else, an object with internal structure, a dimension for a coverage/grid, metadata about the data instance, recorded speech, an image, or metadata about one of the other reported values. Its just one of the items in a dataset.
What we're trying to do is communicate what the value represents and how it is encoded in the data. Given that: the range for variableMeasured should be a description of one of the kinds of values that appears in the dataset. A person, place, event, or observation is not in the dataset-- a specification of how one of these entities is described in the data is what a PropertyValue is about. The propertyID tells us what that value in the dataset is represents, and the dataType tells us how it is represented; name is the label for that item that is found in the data, alternateName can be used to provide a human-intelligible label; description can be used to provide details for a person to read... See issue # for consideration of how to document variables that are about the values in other variables, or about the dataset.
I've tried hard to understand Stephen Richard's detailed explanation of why we absolutely should never use potentially relevant schema.org Types such as schema:Observation, schema:Event, schema:Person, schema:Date etc. directly with 'variableMeasured' to describe what that "variableMeasured" contains, represents, is about, etc. I think this is indeed due to the ambiguity of the 'variableMeasured' property itself, as expressed by Stephen's initial disclaimer "putting aside the language difficulties with 'variableMeasured".
Stephen recommends that we benefit by requiring the extra level of abstraction obtained through the use of the schema:PropertyValue type with 'variableMeasured'-- because Types like "Person", "Observation", etc are not IN the Dataset per se. Whereas my simpler interpretation of 'variableMeasured' is that it clarifies what Types are represented in some 'variableMeasured', and so it would be natural to say a 'variableMeasured' contains information about Persons, Events, Dates, or Observations (where schema.orgObservation allows for further description of what is the 'measuredProperty' of some "observedNode" for an Observation-- which seems very close to aligning with W3C SOSA).
Thus:
Pattern 1-- Dataset "variableMeasured" Date (Stephen does NOT recommend, even if "Date" is a schema.org Type)
Pattern 2-- Dataset "variableMeasured" PropertyValue "propertyID" Date (Stephen recommends-- and BTW this is the use pattern currently described in schema.org documentation for 'variableMeasured')
I completely agree with Stephen that the 2nd pattern is generally useful and necessary for referencing terms in external ontologies, which here could be some URI to a notion of "Date", e.g. http://schema.org/Date. I agree this is the pattern to recommend in general when relevant schema.org Types are not present or insufficiently well axiomatized, and so reference to some external ontology/vocabulary is needed.
I still cannot comprehend however, why, when schema.org offers Types that are in fact more-or-less exactly what is represented in the contents of a 'variableMeasured', we cannot refer to these Types directly. This follows an interpretation of "Dataset > variableMeasured" to mean "Dataset has variableMeasured of Type {Person, Date, Observation, Event... etc}".
The major benefit of this alternate Pattern 1 would be from directly reusing terms from Schema.org, and not needing to refer to external ontologies-- which may be less standardized or accessible than schema.org terms. An additional benefit might redound from the addition of some of these very generic Types that are commonly represented in Dataset 'variablesMeasured" to first class Schema.org Types for "direct reference" (i.e. Pattern 1). I do realize, however, that we can re-use/reference schema.org terms as well in Pattern 2-- as depicted in the example with Type "Date".
So in summary I don't object to sole endorsement of the Pattern 2 usage with 'variableMeasured'. I was trying to adhere to our guiding principle of "Re-use schema.org terms where relevant, and reference external terms (e.g. from Ontologies) when appropriate schema.org Types and properties are missing". And as well keeping the patterns simpler where possible.
we still need to see some example dataset metadata encoded using this approach
From SOSO Telecon:
The following examples explore 3 patterns for the following sample data:
sea_surface_temp, movie_link
43.2, https://example.com/movie/1.mpg
43.8, https://example.com/movie/2.mpg
- Pattern 1 - using Property Value
- Pattern 2 - using Observation
- Pattern 3 - PropertyValue w. a propertyID of schema:Observation
{
"@context": "https://schema.org/",
"@type": "Dataset",
"variableMeasured":
[
# Pattern 1 - using Property Value
{
"@type": "PropertyValue",
"name": "sea_surface_temp",
"description": "sea surface temperature measured in degrees Fahrenheit",
"propertyID": "http://purl.obolibrary.org/obo/ENVO_04000002"
},
{
"@type": "PropertyValue",
"name": "movie_link",
"description": "A link to a movie file",
"propertyID": "http://schema.org/Movie"
}
],
[
# Pattern 2 - using Observation
{
"@type": "Observation",
"name": "sea_surface_temp",
"description": "sea surface temperature measured in degrees Fahrenheit",
"observedNode": {
"@id": "http://purl.obolibrary.org/obo/ENVO_01001581",
"name": "sea surface layer"
},
"measuredProperty": {
"@id": "http://purl.obolibrary.org/obo/PATO_0000146",
"name": "temperature"
}
},
{
"@type": "Observation",
"name": "movie_link",
"description": "A link to a movie file",
"observedNode": {
"@id": "http://purl.obolibrary.org/obo/ENVO_01001581",
"name": "sea surface layer"
}
}
],
[
# Pattern 3 - PropertyValue w. a propertyID of schema:Observation
{
"@type": "PropertyValue",
"name": "sea_surface_temp",
"description": "sea surface temperature measured in degrees Fahrenheit",
"propertyID": {
"@type": "Observation",
"observedNode": {
"@id": "http://purl.obolibrary.org/obo/ENVO_01001581",
"name": "sea surface layer"
},
"measuredProperty": {
"@id": "http://purl.obolibrary.org/obo/PATO_0000146",
"name": "temperature"
}
}
},
{
"@type": "PropertyValue",
"name": "movie_link",
"description": "A link to a movie file",
"propertyID": {
"@type": "Observation",
"observedNode": {
"@id": "http://purl.obolibrary.org/obo/ENVO_01001581",
"name": "sea surface layer"
}
}
}
]
}
note that when using schema Type "Observation", there is both "observedNode" as well as "measuredProperty" (not "observedProperty" as it is documented in examples above). The "observedNode" and "measuredProperty" closely aligns with SOSA, that may be advantageous
thanks @mpsaloha i've fixed the JSON-LD above
patter 2 is not consistent with the expected value type for schema:variableMeaured (
Values expected to be one of these types
--
PropertyValue Text
Yes, agreed. Which is why I think we argued that Pattern3 would be better if one wants to use Observation class.