pygeoapi
pygeoapi copied to clipboard
PostgreSQL provider bbox, bbox-crs, geom-wkt, geom-crs and APIRequest helpers
Overview
This pull request is presented as a work in progress. Please read these as some insights on how we, at National Resources Canada, have integrated pygeoapi to meet our business needs and requirements. This pull request is included in (or leads to) the pull request PR #1024.
Summary
- Adds query capabilities for the PostgreSQL provider:
-> Be able to query using a
geometry(instead of a bounding box) as the spatial filter -> Be able to query using differentcrsthan the one in which the data is stored in -> Note: we are aware (and interested) in the PR #964. We think this current PR is complementary as it provides another entry point (simple and quick way) to spatially-query the records.
Details
-
Adds 3 new helper functions in the
APIRequestclass to offer standardization when reading input parameters. This standardization: -> Helps various endpoints that accept those common parameters (e.g.bbox,bbox-crs,geomandgeom-crs) by having reusable code between the endpoints. -> Helps when reading the parameters sent via either GET or POST methods. -> Note: eventually, more input parameters processing (reading/validating) could be standardized and handled inAPIRequest. -
Adds support for a provider-level config called
crs, in the Yaml config, which indicates the crs of the data itself, as stored in the database. -
Adds the option to accept a
bbox-crsin the{collection_id}/itemsend point(s). This option is similar to thebbox-crsaddition for the rasterio provider, see PR #1012, except this time it's for the postgresql provider. Admittedly, this opens it up for all providers even those not possibly supporting it at the moment. If it is too soon to open the option in the Yaml file, simply exclude that part in theopenapi.pyfile when merging the code. Only an experienced dev will know how to use it. -
Adds the option to accept a geom (geometry WKT) and a geom-crs (crs of the geometry WKT) in the
{collection_id}/itemsend point(s). Admittedly, this opens it up for all providers even those not possibly supporting it at the moment. If it is too soon to open the option in the Yaml file, simply exclude that part in theopenapi.pyfile when merging the code. Only an experienced dev will know how to use it.
Related Comments / Questions / Suggestions
- How is the POST request for the
{collection_id}/itemsendpoint supposed to be treated with regards to thebboxparameters? I assumed that when called via a POST request, every parameters would be read from POST body, but from reading the code, it seems the parameters are read from GET? Maybe I am missing something or it's a track that's work in progress. In any case, for the new possibility to send a geometry WKT to the{collection_id}/itemsendpoint, I'm reading the POST body to read the geometry, because a WKT geometry can be rather long and explode the length for a url GET request.
Contributions and Licensing
(as per https://github.com/geopython/pygeoapi/blob/master/CONTRIBUTING.md#contributions-and-licensing)
- [x] I'd like to contribute [feature X|bugfix Y|docs|something else] to pygeoapi. I confirm that my contributions to pygeoapi will be compatible with the pygeoapi license guidelines at the time of contribution.
- [x] I have already previously agreed to the pygeoapi Contributions and Licensing Guidelines
@Alex-NRCan the function to query by Geometry in addition to bounding box would make this OGCAPI-EDR and should use the AREA query type.
EDR can be used for non coverage type data, to be EDR just need at least 1 of the query types - https://docs.ogc.org/is/19-086r5/19-086r5.html#toc28 also see requirement a.47
Our goal with https://github.com/geopython/pygeoapi/pull/964 was to enable PostGres provider for EDR queries against simple vector features. CQL was a stepping stone, next step was to write CQL templates for the various EDR query types which could be enabled via config to setup EDR endpoint in addition to Features, unfortunately we're out of budget for this next step.
@KoalaGeo Very nice! I didn't know about EDR queries. Thanks for commenting, and pointing out to the OGC specs, very interesting.
I took a few minutes to read about the /area endpoint that you mention and found this link https://developer.ogc.org/api/edr/index.html#tag/Collection-data-queries/operation/GetDataForArea which I believe is what you're referring to(?) and indeed, that /area endpoint accepts a polygon as a WKT format (in fact all EDR endpoints seem to work with a WKT). I've noticed that it also uses a crs parameter for the coordinate system (which is just about the geom_crs equivalent in this pull request).
It wouldn't take much to adjust this PR to support the /area endpoint, replace geom with coords and replace geom_crs with crs and basically redirect to the code within this PR to make it support the /area, the EDR way.
Furthermore, two comments come to mind when I read about the EDR:
-
I wonder if there'd be a way to also query attributively at the same time as query spatially (e.g.:
/area?coords={polygon_wkt}&cql={attribute_query}) or something like that? Edit: Maybe that is what you were working on? -
I notice that the endpoints in the spec use GET. In this PR, I support GET and also support POST, because a WKT can get rather long for a GET request. I do wonder if they plan to support POST in the specs?
No worries, it's a great bit of work you've done.
Your questions are best answered by EDR gurus @chris-little and @m-burgoyne!
But for the first one, we weren't thinking of offering a combined EDR query type + CQL although I can see the advantages. Thought features+cql for power users + when you might want to offer really complex queries, and the EDR for a slimmed down option - cql queries being a higher technical barrier. You wouldn't need the separate area query as that could be bundled in the cql.
Depending on the data if it requires EDR + CQL queries then maybe using the sensorthingapi standard would be a better delivery route - ODATA protocol allows some very efficient complex queries which aren't possible with EDR / CQL
@Alex-NRCan I notice that the endpoints in the spec use GET, Version 1.0.1 of OGC API-EDR is GET only, but V1.1 is about to be released for public comment, and it supports getting via POST, for exactly the use case you outlined - long queries. V1.1 is now the master branch. HTH
@Alex-NRCan we've got our pygeoapi image deployed with our CQL functionality included.
All earthquakes in a polygon with a depth < 2 km
We're using the CQL intersects rather than the EDR area endpoint method.
Very good! I'll refresh by code with it. I see that it's using filter parameter.
When I get a chance, I'll check to implement the EDR mechanisms inside the postgresql provider, especially (1) if that's something that could be interesting to the pygeoapi project at large, and (2) if I can make it work (compatible) with the most recent changes and additions without breaking other implementations (I believe it should be).
That sounds great!
You might want to look at / fork from https://github.com/BritishGeologicalSurvey/pygeoapi/tree/pygeofilter-sqlalchemy it's a major refactor of the PostGres.py file to utilise sqlalchemy throughout.
EDR could be a set of API templates which actually use the CQL queries.
@Alex-NRCan given #964 is now merged, you should be able to rebase and update accordingly.
This pull request has been edited to be compatible with PR #964
The resulting action items in this PR are:
In OpenAPI specs
- New
bbox-crsparameter in OpenAPI spec for the{collection_id}/itemsendpoints (for bothGETandPOSTmethods). This parameter aligns with the recent addition of the samebbox-crsparameter for the/coverageend-point.
APIRequest class
- New methods in APIRequest to manage the bbox and bbox-crs parameters. This can be discussed further. The reason that I've kept those (simplified versions) is that I wanted to reduce duplication of code-logic in 3 different end-points (
get_collection_items,post_collection_itemsandget_collection_coverage). I also think separating the parameters management and validation from the logic handled in the request methods themselves can become beneficial later. If this is really unwanted, I'll re-work it.
PostgreSQL provider
-
The main feature - Updated the
_get_bbox_filterfunction to create an envelope using an optionally provided bbox_crs and re-project the bbox to the SRID of the underlying table when sending the query to retrieve the features. That way, one can send a bbox in another srid than the underlying table. -
PostgreSQL provider: Support a
bbox-crsparameter as part of thequeryfunction, default value beingNone -
PostgreSQL provider: Store the schema value as a class attribute, instead of reading the value in the
_reflect_table_modelfunction from theself.db_search_path[0]attribute. Now,self.schemais defined as part of the initialization, right after initializingself.db_search_pathwhich I thought would make sense. This way,self.schemacan be used after in various functions including the newget_sridfunction. -
PostgreSQL provider: New
get_sridfunction which reads the SRID of the underlying table using geoalchemy2. -
PostgreSQL provider: Store the srid of the underlying table in the class attribute
self.sridsimilarily to how the fields are stored inself.fields
EDIT: Not sure why it's failing a GIT automated test. Is something wrong?
@Alex-NRCan bit cheeky but as your working on the PostGres provider might you be able to look at https://github.com/geopython/pygeoapi/issues/1007 ?
@Alex-NRCan bit cheeky but as your working on the PostGres provider might you be able to look at #1007 ?
Thanks for looping me in on this issue. I'll check if I can find the time to work on it. It's an interesting one and I can see the benefits of it :)
@Alex-NRCan @tomkralidis I started working on #1128. That issue is about fully implementing the OGC Standard OGC API Features Part 2, to be called "Part 2" below. I see a strong overlap with this PR. I am in the phase of both understanding the Standard and what would be required to extend pygeoapi. So also seeking clarifications.
I studied the 3 files changed in this PR. I have some general and specific questions:
- is it the intention of this PR to fully support the "Part 2" , or partly and only for the PostgreSQL Provider and
bbox-crs, notcrs? - this PR handles POST parameters. Is POST supported, by the OAPIF standard and pygeoapi? I can only find GET support in the OAPIF OGC Standard. How would a POST body look like? More practically: the handling of a POST Body may be done once and generically in pygeoapi, and provided as a params
dict. It looks like the POST body is decoded for each 'get_param' likebboxandbbox-crs. - possibly all Providers will need to handle
bbox-crs, or at least have some way to indicate that it is not supported. I was thinking of adding a base Provider method that could interrogate Providers for native support for bothbbox-crsandcrs. If the Provider does not support these facilities, a generic implementation in the "API handling" could take care of Transforms. But if a Provider has a Storage CRS that matches these, no transforms are required IMHO. - adding Requirements Clauses and supported CRSs in the metadata. The Part 2 standard provides support for
bbox-crs,crsas parameter plus providing 'Storage CRS' as metadata. I am a bit confused (with Part 2) that appearantly the list of CRS applies to bothbbox-crsandcrsparams. - my aim is to have sensible defaults, so existing pygeoapi deployments are not affected. The means would be via configuration.
- we will need to extend the pygeoapi unit tests
Like said I am in an investigation phase, focus on Part 2, gaining understanding, noticing work on Coverage and EDR. We may converse further to clarify and possibly collaborate, building on each other's work. My deadline, is March 16 2023 to complete this work.
Thank you for your interest @justb4 !
Off the bat, I'm not familiar at all with the OGC Standard OGC API Features Part 2 :) That being said, I'll try to answer, the best I can, to some of the points you've made by focusing on what this PR is specifically about:
-
This PR focuses on providing a
bbox-crsparameter only for thePostgresprovider indeed. Please note that a previous PR #1012 that I'd created (and was accepted) focused on adding abbox-crsfor therasterioprovider. Also, I have to make it clear that I didn't come up with thebbox-crs. That parameter was already documented, as per thepygeoapiOpenAPI Swagger specs. I just couldn't make it work for our needs (inrasterioandpostgresqlproviders) - so I've coded their implementations. I didn't work on adding the parameter to all providers and I don't know which other, if any, providers support it already. -
This PR has code to retrieve the
CRSof the data on the fly (seeget_srid()method); so there's no need to have theCRSin the configuration file, for thePostgresprovider. I personally preferred it to be dynamic instead of a configuration. -
I did write some code in this PR to handle POST requests, because eventually, notably with
CQL, we're going to need to send more data to query information than what's possible to send via a GET request. GET requests are limited, because they have to go through a URL and that URL has a fixed limit in length. Chris Little points this out a few comments above on this PR: https://github.com/geopython/pygeoapi/pull/1023#issuecomment-1285305533. I think they're working on adding POST in the specs, but don't quote me on that though, as I'm not an OGC expert. That said, right now, just for thebbox-crs, the POST requirements aren't necessary as it doesn't add much to the payload. However, like you pointed out, the code also serves (viaAPIRequestclass (read_param(),read_bbox())) to centralize some functions in order to reduce the duplicated logic in methods likeget_collection_itemsandpost_collection_items. Originally, this PR had more of those (e.g.: it was also handling geometry-wkt and geometry-crs), but it got simplified so that another PR would implement CQL in the Postgres provider.
More information about our context at National Resources Canada: we are developing a tool for geospatial data extraction that uses pygeoapi to provide OGC API (Coverages and Features) services from our data. As we are limited in resources, our contribution effort to the pygeoapi development, at the moment, is limited to functionalities that are addressing our organisation needs but that can also benefits other pygeoapi users needs.
@Alex-NRCan thanks for your clarifications! Your three points:
- yes, I've seen the bbox-crs handling in the code for coverages. Was looking into some reuse. Though Coverages (and thus the rasterio provider) is another OGC standard than Features for which PostgreSQL provider applies.
- ok, for your use case. But the Part 2 standard requires supported CRSs in the metadata
- POST request: I agree, GET is limited in URL-length, even per-browser. But you are pointing to EDR, which is a different standard than Features, well IMHO. I just wonder how a POST Body will look like. A flat (Geo)JSON structure? Or is als "form-encoding" allowed?
Thinking out loud: best is, if that fits your timing, that I will implement #1128 first. Think this will cover much of this PR's functionality, at least at the generic API (api.py) level. POST support and PostgreSQL Provider code is then to be added. But then again, I will need a next level of understanding both the Standard and existing pygeoapi code...
As per RFC4, this Pull Request has been inactive for 90 days. In order to manage maintenance burden, it will be automatically closed in 7 days.
As per RFC4, this Pull Request has been closed due to there being no activity for more than 90 days.