pygeoapi icon indicating copy to clipboard operation
pygeoapi copied to clipboard

GeoSPARQL

Open ldesousa opened this issue 4 years ago • 18 comments

Early support for GeoSPARQL as a data provider. For now only the get and query operations are supported, but there is more to come. The provider creates GeoSPARQL queries to retrieve RDF triplets from the GeoSPARQL endpoint in the JSON-LD format. These triplets are then transformed into a GeoJSON document to be returned by the API. The URIs of related ontological classes and other RDF resources are kept in the properties section of the GeoJSON document.

Issues to note:

  1. Modified the name of the GeoJSON provider to geojson_prov to avoid import conflicts. There might be a more elegant way around this.

  2. The Docker set up is not fully tested. That can only happen when a new image with this provider is available from DockerHub. But the Virtuoso bit is tested.

  3. So far only tested with Virtuoso. Will later test with Fuseki.

ldesousa avatar Jan 27 '21 19:01 ldesousa

Thanks @ldesousa this is a great contribution. RE: "more to come", should we mark this PR as WIP, or is this ready to review/merge?

tomkralidis avatar Jan 28 '21 03:01 tomkralidis

It is work in progress as I still want to introduce more features. But what is in this pull request is functional. I would suggest a merge into development so others can test it too.

ldesousa avatar Jan 28 '21 07:01 ldesousa

Travis is trying to run the tests, but there is no Virtuoso instance set up in the environment. Should something be done about the tests? Or can the PR be merged like this?

ldesousa avatar Jan 28 '21 16:01 ldesousa

In general, if there are tests, then they should have the appropriate setup.

tomkralidis avatar Jan 28 '21 16:01 tomkralidis

This is highly related with #469 and #173 . Ideally, you could specify a URI field populated with external/global persistent identifiers that resolve to the pygeoapi /items/{feature id} page and populate the '@id' of the JSON-LD version, which is populated by the results of querying the GeoSPARQL endpoint as demonstrated here. @dblodgett-usgs this of interest, yes?

ksonda avatar Jan 29 '21 18:01 ksonda

Hi @ldesousa, I'm curious for your use case here. From what I understand SPARQL is very powerfull via its capability to compose queries on complex data, but it is quite poor on performance. For that reason app developers that build on triple stores, usually query a structured subset of a graph and store it in a fast responding cache, such as a database or search index, before exposing it via an API. Maybe this aspect of caching can be included in this implementation, but I think I would rather see that in a separate component.

I thought geojson-ld is not possible due to conflicts in both specifications. Has this aspect been improved on recent version of json-ld? But I guess the provider could both provide json-ld as well as geojson responses.

pvgenuchten avatar Jan 29 '21 19:01 pvgenuchten

@pvgenuchten The GeoSPARQL provider is independent of the back-end. Whether it is a true triple store or something else, as long as it can reply to GeoSPARQL queries it will work. Currently I am working with Virtuoso, that internally stores triples in a relational database. So far I had no performance related issues.

GeoJSON, JSON-LD and GeoJSON-LD are all independent specifications that do not square easily with one another. Moreover, GeoJSON-LD has no institutional backing and can't be regarded exactly as a standard. JSON-LD is by itself enough to convey geo-spatial information using the GeoSPARQL ontology. The bit of innovation needed in this field is the extension of GeoSPARQL to raster data.

ldesousa avatar Feb 01 '21 07:02 ldesousa

I just had a irl meeting to discuss geosparql backend with @ldesousa My doubts if sparql will provide enough performance for this use case have not gone away (alternative would be to insert daily the sparql-response into a postgis table (or elastic) as id, time, spatial, json-ld-blob, and use that as a source for pygeoapi) @ldesousa raised that for them creating proper json-ld from a postgres backend was quite a challenge (sparql provides that efficiently). And also that the triple store may contain objects in multiple evolutions of the (soil) standard. We also considered the case that at some point users may use pygeoapi to post new or updated items to the collection using ogc-api-features, these updates would then be included in the triplestore.

Enough potential to make this an interesting direction of development.

pvgenuchten avatar Feb 04 '21 14:02 pvgenuchten

Hi folks, any news on this PR?

ldesousa avatar Mar 11 '21 18:03 ldesousa

@ldesousa can you rebase, after which I'm guessing this is ready for a review?

tomkralidis avatar Mar 12 '21 01:03 tomkralidis

JSON-LD is by itself enough to convey geo-spatial information using the GeoSPARQL ontology.

This suggests that item-level JSON-LD response for feature collections might have an option for GeoJSON-LD (default) and some kind of GeoSPARQL representation. Since #676 the optional JSON-LD using schema:geo vocabulary for point-type geometry is implemented, but now I'm wondering if GeoSPARQL represenation should be another option or just preferable overall, and support non-point geometry as well.

ksonda avatar May 07 '21 15:05 ksonda

This suggests that item-level JSON-LD response for feature collections might have an option for GeoJSON-LD (default) and some kind of GeoSPARQL representation. Since #676 the optional JSON-LD using schema:geo vocabulary for point-type geometry is implemented, but now I'm wondering if GeoSPARQL represenation should be another option or just preferable overall, and support non-point geometry as well.

I would strongly advise against pursuing the GeoJSON-LD route. It is not a standard and is unlikely to become one soon.

GeoSPARQL is an ontology, not an encoding scheme. A resource making use of GeoSPARQL can be encoded as XML/RDF, Turtle, JSON-LD, etc. GeoSPARQL is not meant as a "representation".

ldesousa avatar May 10 '21 08:05 ldesousa

Thanks @ldesousa

GeoJSONLD has been the default behavior for a long time so #676 left it in for now, but we agree that standard JSON-LD should be used to be used to communicate geometry using existing ontologies. What I mean by alternative "representation" is that geometry in JSON-LD generated by pygeoapi could be the geosparql ontology or the schema.org geo types.

ksonda avatar May 10 '21 11:05 ksonda

GeoJSON-LD was not the default behaviour when I submitted this pull request. Neither was it JSON-LD. At the time all providers were returning GeoJSON objects. Unless we are speaking of different things.

ldesousa avatar May 10 '21 13:05 ldesousa

appending ?f=jsonld to /collections/{collection}/items/{item id}, or requesting application/ld+json, returns a json-ld document, in a GeoJSONLD format since at least May 2020. We added a configuration option in #676 for this to instead return a JSON-LD document with schema.org type geometry.

But I'm asking whether we should add an option to return that JSON-LD document with geosparql geometry.

ksonda avatar May 10 '21 13:05 ksonda

If GeoSPARQL is ever accepted as data source that would indeed make sense.

ldesousa avatar May 11 '21 06:05 ldesousa

I'm interested to keep this PR in focus. Especially in academia there is a lot of triple data which is hard to consume by spatial clients. If pygeoapi can provide a proxy into the triple stores, this is of high interest. I guess we should have a look at the current conflicts.

pvgenuchten avatar Jun 24 '22 21:06 pvgenuchten

@webb-ben we should contribute as bendwidth allows. I'm wondering if the configuration should allow more complex sparql queries to allow more than one "hop" to enrich the geometry with relevant properties as desired

ksonda avatar Jun 24 '22 23:06 ksonda

After 18 months it is better to close this pull request. The functionality I proposed here is now implemented in Prez, and in a more sofisticated way. If you ever revisit the possibility of adding a GeoSPARQL triple store as features provider, I would advise you to follow a similar approach, making use of the ogcldapi OWL profile.

ldesousa avatar Aug 28 '22 15:08 ldesousa