GeoSPARQL
Early support for GeoSPARQL as a data provider. For now only the get and query operations are supported, but there is more to come. The provider creates GeoSPARQL queries to retrieve RDF triplets from the GeoSPARQL endpoint in the JSON-LD format. These triplets are then transformed into a GeoJSON document to be returned by the API. The URIs of related ontological classes and other RDF resources are kept in the properties section of the GeoJSON document.
Issues to note:
-
Modified the name of the GeoJSON provider to
geojson_provto avoid import conflicts. There might be a more elegant way around this. -
The Docker set up is not fully tested. That can only happen when a new image with this provider is available from DockerHub. But the Virtuoso bit is tested.
-
So far only tested with Virtuoso. Will later test with Fuseki.
Thanks @ldesousa this is a great contribution. RE: "more to come", should we mark this PR as WIP, or is this ready to review/merge?
It is work in progress as I still want to introduce more features. But what is in this pull request is functional. I would suggest a merge into development so others can test it too.
Travis is trying to run the tests, but there is no Virtuoso instance set up in the environment. Should something be done about the tests? Or can the PR be merged like this?
In general, if there are tests, then they should have the appropriate setup.
This is highly related with #469 and #173 . Ideally, you could specify a URI field populated with external/global persistent identifiers that resolve to the pygeoapi /items/{feature id} page and populate the '@id' of the JSON-LD version, which is populated by the results of querying the GeoSPARQL endpoint as demonstrated here. @dblodgett-usgs this of interest, yes?
Hi @ldesousa, I'm curious for your use case here. From what I understand SPARQL is very powerfull via its capability to compose queries on complex data, but it is quite poor on performance. For that reason app developers that build on triple stores, usually query a structured subset of a graph and store it in a fast responding cache, such as a database or search index, before exposing it via an API. Maybe this aspect of caching can be included in this implementation, but I think I would rather see that in a separate component.
I thought geojson-ld is not possible due to conflicts in both specifications. Has this aspect been improved on recent version of json-ld? But I guess the provider could both provide json-ld as well as geojson responses.
@pvgenuchten The GeoSPARQL provider is independent of the back-end. Whether it is a true triple store or something else, as long as it can reply to GeoSPARQL queries it will work. Currently I am working with Virtuoso, that internally stores triples in a relational database. So far I had no performance related issues.
GeoJSON, JSON-LD and GeoJSON-LD are all independent specifications that do not square easily with one another. Moreover, GeoJSON-LD has no institutional backing and can't be regarded exactly as a standard. JSON-LD is by itself enough to convey geo-spatial information using the GeoSPARQL ontology. The bit of innovation needed in this field is the extension of GeoSPARQL to raster data.
I just had a irl meeting to discuss geosparql backend with @ldesousa My doubts if sparql will provide enough performance for this use case have not gone away (alternative would be to insert daily the sparql-response into a postgis table (or elastic) as id, time, spatial, json-ld-blob, and use that as a source for pygeoapi) @ldesousa raised that for them creating proper json-ld from a postgres backend was quite a challenge (sparql provides that efficiently). And also that the triple store may contain objects in multiple evolutions of the (soil) standard. We also considered the case that at some point users may use pygeoapi to post new or updated items to the collection using ogc-api-features, these updates would then be included in the triplestore.
Enough potential to make this an interesting direction of development.
Hi folks, any news on this PR?
@ldesousa can you rebase, after which I'm guessing this is ready for a review?
JSON-LD is by itself enough to convey geo-spatial information using the GeoSPARQL ontology.
This suggests that item-level JSON-LD response for feature collections might have an option for GeoJSON-LD (default) and some kind of GeoSPARQL representation. Since #676 the optional JSON-LD using schema:geo vocabulary for point-type geometry is implemented, but now I'm wondering if GeoSPARQL represenation should be another option or just preferable overall, and support non-point geometry as well.
This suggests that item-level JSON-LD response for feature collections might have an option for GeoJSON-LD (default) and some kind of GeoSPARQL representation. Since #676 the optional JSON-LD using schema:geo vocabulary for point-type geometry is implemented, but now I'm wondering if GeoSPARQL represenation should be another option or just preferable overall, and support non-point geometry as well.
I would strongly advise against pursuing the GeoJSON-LD route. It is not a standard and is unlikely to become one soon.
GeoSPARQL is an ontology, not an encoding scheme. A resource making use of GeoSPARQL can be encoded as XML/RDF, Turtle, JSON-LD, etc. GeoSPARQL is not meant as a "representation".
Thanks @ldesousa
GeoJSONLD has been the default behavior for a long time so #676 left it in for now, but we agree that standard JSON-LD should be used to be used to communicate geometry using existing ontologies. What I mean by alternative "representation" is that geometry in JSON-LD generated by pygeoapi could be the geosparql ontology or the schema.org geo types.
GeoJSON-LD was not the default behaviour when I submitted this pull request. Neither was it JSON-LD. At the time all providers were returning GeoJSON objects. Unless we are speaking of different things.
appending ?f=jsonld to /collections/{collection}/items/{item id}, or requesting application/ld+json, returns a json-ld document, in a GeoJSONLD format since at least May 2020. We added a configuration option in #676 for this to instead return a JSON-LD document with schema.org type geometry.
But I'm asking whether we should add an option to return that JSON-LD document with geosparql geometry.
If GeoSPARQL is ever accepted as data source that would indeed make sense.
I'm interested to keep this PR in focus. Especially in academia there is a lot of triple data which is hard to consume by spatial clients. If pygeoapi can provide a proxy into the triple stores, this is of high interest. I guess we should have a look at the current conflicts.
@webb-ben we should contribute as bendwidth allows. I'm wondering if the configuration should allow more complex sparql queries to allow more than one "hop" to enrich the geometry with relevant properties as desired
After 18 months it is better to close this pull request. The functionality I proposed here is now implemented in Prez, and in a more sofisticated way. If you ever revisit the possibility of adding a GeoSPARQL triple store as features provider, I would advise you to follow a similar approach, making use of the ogcldapi OWL profile.