geoblacklight-schema
geoblacklight-schema copied to clipboard
Remove solr specific suffixes
Below is an example of a JSON-LD format for the GeoBlacklight schema that abstracts out the Solr specific details, and makes a couple other changes. Namely, this example uses @id
in lieu of layer_slug_s
, and dc:identifiers
for a set of alternate identifiers and drops uuid
( #53 ). Note that dct:references
becomes a proper JSON hash, and all derivative fields are dropped.
To ingest the abstracted JSON-LD format into a Solr index would require a shim of harvesting code that derives the fields needed for the Solr implementation (such as solr_geom
's ENVELOPE syntax from the georss:box
field). This harvesting code could also provide a conversion utility from the current version of the JSON schema and the 1.0 abstracted JSON-LD version.
There's several other issues with various individual fields, such as moving layer_id_s
into dct:references
#77, but the example below is meant to illustrate the JSON-LD file format and its implications as an interchange format.
The example shows that the JSON-LD'ness is pretty straightforward. Namely, the use of @context
for the prefixes, and @id
to identify the layer.
{
"@context": {
"dc": "http://purl.org/dc/elements/1.1/",
"dct": "http://purl.org/dc/terms/",
"georss": "http://georss.org#",
"layer": "http://geoblacklight.org/schema/1.0#",
"stanford": "http://library.stanford.edu#"
},
"@id": "stanford-fr148tw1471",
"dc:identifier": [
"http://purl.stanford.edu/fr148tw1471"
],
"dc:title": "Geology: Offshore of Point Reyes, California, 2010",
"dc:description": "This polygon shapefile represents geologic features within the offshore region of Point Reyes, California...",
"dc:rights": "Public",
"dct:provenance": "Stanford",
"dct:references": {
"http://schema.org/url": "http://purl.stanford.edu/fr148tw1471",
"http://schema.org/downloadUrl": "http://stacks.stanford.edu/file/druid:fr148tw1471/data.zip",
"http://www.loc.gov/mods/v3": "http://purl.stanford.edu/fr148tw1471.mods",
"http://www.isotc211.org/schemas/2005/gmd/": "http://opengeometadata.stanford.edu/metadata/edu.stanford.purl/druid:fr148tw1471/iso19139.xml",
"http://www.w3.org/1999/xhtml": "http://opengeometadata.stanford.edu/metadata/edu.stanford.purl/druid:fr148tw1471/default.html",
"http://www.opengis.net/def/serviceType/ogc/wfs": "https://geowebservices.stanford.edu/geoserver/wfs",
"http://www.opengis.net/def/serviceType/ogc/wms": "https://geowebservices.stanford.edu/geoserver/wms"
},
"layer:id": "druid:fr148tw1471",
"layer:geom_type": "Polygon",
"layer:modified_dt": "2016-02-05T22:07:10Z",
"dc:format": "Shapefile",
"dc:language": "English",
"dc:type": "Dataset",
"dc:publisher": "Geological Survey (U.S.)",
"dc:creator": [
"Michael W. Manson",
"Janet T. Watt",
"H. Gary Greene",
"Moss Landing Marine Laboratories",
"Pacific Coastal and Marine Science Center",
"Golden, Nadine E."
],
"dc:subject": [
"Geology",
"Geomorphology",
"Sediments (Geology)",
"Marine sediments",
"Ocean bottom",
"Geoscientific Information",
"Oceans"
],
"dct:issued": "2014",
"dct:temporal": [
"2006-2010"
],
"dct:spatial": [
"California",
"Marin County (Calif.)",
"Drakes Bay (Calif.)",
"Pacific Ocean"
],
"dc:relation": [
"http://sws.geonames.org/3687919/",
"http://sws.geonames.org/5370468/",
"http://sws.geonames.org/8411083/"
],
"georss:box": "37.939061 -123.091039 38.098269 -122.892843",
"stanford:rights_metadata": "<?xml version=\"1.0\"?>\n<rightsMetadata>\n <access type=\"discover\">\n <machine>\n <world/>\n </machine>\n </access>\n <access type=\"read\">\n <machine>\n <world/>\n </machine>\n </access>\n <use>\n <human type=\"useAndReproduction\">This item is in the public domain. There are no restrictions on use.</human>\n <human type=\"creativeCommons\"/>\n <machine type=\"creativeCommons\"/>\n </use>\n <copyright>\n <human>This work is in the Public Domain, meaning that it is not subject to copyright.</human>\n </copyright>\n</rightsMetadata>\n"
}
I like seeing this as JSON-LD. Thanks for getting this up @drh-stanford!
Yes thanks, looks good! One quick concern I have is increasing the complexity of indexing documents from their native format. Maybe we can use something from here: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-TransformingandIndexingCustomJSON ?
Though it does seem like this might not fully meet our need, but the XML approach seems more amenable, as you can provide custom xslt's to transform your data. Sigh.
Also maybe the Data Import Handlers (DIH) are an option?
The layer:id
probably should move into the dct:references
since it's not really an "identifier" as much as it's a parameter to the WMS/WFS protocol.
Not to throw a wrench in things, but we should possible talk about DCAT as an alternative too! https://project-open-data.cio.gov/v1.1/schema/