pygeoapi icon indicating copy to clipboard operation
pygeoapi copied to clipboard

PostgreSQL provider bbox, bbox-crs, geom-wkt, geom-crs and APIRequest helpers

Open Alex-NRCan opened this issue 3 years ago • 16 comments
trafficstars

Overview

This pull request is presented as a work in progress. Please read these as some insights on how we, at National Resources Canada, have integrated pygeoapi to meet our business needs and requirements. This pull request is included in (or leads to) the pull request PR #1024.

Summary

  • Adds query capabilities for the PostgreSQL provider: -> Be able to query using a geometry (instead of a bounding box) as the spatial filter -> Be able to query using different crs than the one in which the data is stored in -> Note: we are aware (and interested) in the PR #964. We think this current PR is complementary as it provides another entry point (simple and quick way) to spatially-query the records.

Details

  • Adds 3 new helper functions in the APIRequest class to offer standardization when reading input parameters. This standardization: -> Helps various endpoints that accept those common parameters (e.g. bbox, bbox-crs, geom and geom-crs) by having reusable code between the endpoints. -> Helps when reading the parameters sent via either GET or POST methods. -> Note: eventually, more input parameters processing (reading/validating) could be standardized and handled in APIRequest.

  • Adds support for a provider-level config called crs, in the Yaml config, which indicates the crs of the data itself, as stored in the database.

  • Adds the option to accept a bbox-crs in the {collection_id}/items end point(s). This option is similar to the bbox-crs addition for the rasterio provider, see PR #1012, except this time it's for the postgresql provider. Admittedly, this opens it up for all providers even those not possibly supporting it at the moment. If it is too soon to open the option in the Yaml file, simply exclude that part in the openapi.py file when merging the code. Only an experienced dev will know how to use it.

  • Adds the option to accept a geom (geometry WKT) and a geom-crs (crs of the geometry WKT) in the {collection_id}/items end point(s). Admittedly, this opens it up for all providers even those not possibly supporting it at the moment. If it is too soon to open the option in the Yaml file, simply exclude that part in the openapi.py file when merging the code. Only an experienced dev will know how to use it.

Related Comments / Questions / Suggestions

  • How is the POST request for the {collection_id}/items endpoint supposed to be treated with regards to the bbox parameters? I assumed that when called via a POST request, every parameters would be read from POST body, but from reading the code, it seems the parameters are read from GET? Maybe I am missing something or it's a track that's work in progress. In any case, for the new possibility to send a geometry WKT to the {collection_id}/items endpoint, I'm reading the POST body to read the geometry, because a WKT geometry can be rather long and explode the length for a url GET request.

Contributions and Licensing

(as per https://github.com/geopython/pygeoapi/blob/master/CONTRIBUTING.md#contributions-and-licensing)

  • [x] I'd like to contribute [feature X|bugfix Y|docs|something else] to pygeoapi. I confirm that my contributions to pygeoapi will be compatible with the pygeoapi license guidelines at the time of contribution.
  • [x] I have already previously agreed to the pygeoapi Contributions and Licensing Guidelines

Alex-NRCan avatar Oct 19 '22 17:10 Alex-NRCan

@Alex-NRCan the function to query by Geometry in addition to bounding box would make this OGCAPI-EDR and should use the AREA query type.

EDR can be used for non coverage type data, to be EDR just need at least 1 of the query types - https://docs.ogc.org/is/19-086r5/19-086r5.html#toc28 also see requirement a.47

KoalaGeo avatar Oct 19 '22 21:10 KoalaGeo

Our goal with https://github.com/geopython/pygeoapi/pull/964 was to enable PostGres provider for EDR queries against simple vector features. CQL was a stepping stone, next step was to write CQL templates for the various EDR query types which could be enabled via config to setup EDR endpoint in addition to Features, unfortunately we're out of budget for this next step.

KoalaGeo avatar Oct 19 '22 21:10 KoalaGeo

@KoalaGeo Very nice! I didn't know about EDR queries. Thanks for commenting, and pointing out to the OGC specs, very interesting. I took a few minutes to read about the /area endpoint that you mention and found this link https://developer.ogc.org/api/edr/index.html#tag/Collection-data-queries/operation/GetDataForArea which I believe is what you're referring to(?) and indeed, that /area endpoint accepts a polygon as a WKT format (in fact all EDR endpoints seem to work with a WKT). I've noticed that it also uses a crs parameter for the coordinate system (which is just about the geom_crs equivalent in this pull request).

It wouldn't take much to adjust this PR to support the /area endpoint, replace geom with coords and replace geom_crs with crs and basically redirect to the code within this PR to make it support the /area, the EDR way.

Furthermore, two comments come to mind when I read about the EDR:

  • I wonder if there'd be a way to also query attributively at the same time as query spatially (e.g.: /area?coords={polygon_wkt}&cql={attribute_query}) or something like that? Edit: Maybe that is what you were working on?

  • I notice that the endpoints in the spec use GET. In this PR, I support GET and also support POST, because a WKT can get rather long for a GET request. I do wonder if they plan to support POST in the specs?

Alex-NRCan avatar Oct 19 '22 22:10 Alex-NRCan

No worries, it's a great bit of work you've done.

Your questions are best answered by EDR gurus @chris-little and @m-burgoyne!

But for the first one, we weren't thinking of offering a combined EDR query type + CQL although I can see the advantages. Thought features+cql for power users + when you might want to offer really complex queries, and the EDR for a slimmed down option - cql queries being a higher technical barrier. You wouldn't need the separate area query as that could be bundled in the cql.

Depending on the data if it requires EDR + CQL queries then maybe using the sensorthingapi standard would be a better delivery route - ODATA protocol allows some very efficient complex queries which aren't possible with EDR / CQL

KoalaGeo avatar Oct 20 '22 00:10 KoalaGeo

@Alex-NRCan I notice that the endpoints in the spec use GET, Version 1.0.1 of OGC API-EDR is GET only, but V1.1 is about to be released for public comment, and it supports getting via POST, for exactly the use case you outlined - long queries. V1.1 is now the master branch. HTH

chris-little avatar Oct 20 '22 10:10 chris-little

@Alex-NRCan we've got our pygeoapi image deployed with our CQL functionality included.

All earthquakes in a polygon with a depth < 2 km

We're using the CQL intersects rather than the EDR area endpoint method.

KoalaGeo avatar Oct 25 '22 22:10 KoalaGeo

Very good! I'll refresh by code with it. I see that it's using filter parameter.

When I get a chance, I'll check to implement the EDR mechanisms inside the postgresql provider, especially (1) if that's something that could be interesting to the pygeoapi project at large, and (2) if I can make it work (compatible) with the most recent changes and additions without breaking other implementations (I believe it should be).

Alex-NRCan avatar Oct 25 '22 22:10 Alex-NRCan

That sounds great!

You might want to look at / fork from https://github.com/BritishGeologicalSurvey/pygeoapi/tree/pygeofilter-sqlalchemy it's a major refactor of the PostGres.py file to utilise sqlalchemy throughout.

EDR could be a set of API templates which actually use the CQL queries.

KoalaGeo avatar Oct 25 '22 23:10 KoalaGeo

@Alex-NRCan given #964 is now merged, you should be able to rebase and update accordingly.

tomkralidis avatar Oct 29 '22 06:10 tomkralidis

This pull request has been edited to be compatible with PR #964

The resulting action items in this PR are:

In OpenAPI specs

  • New bbox-crs parameter in OpenAPI spec for the {collection_id}/items endpoints (for both GET and POST methods). This parameter aligns with the recent addition of the same bbox-crs parameter for the /coverage end-point.

APIRequest class

  • New methods in APIRequest to manage the bbox and bbox-crs parameters. This can be discussed further. The reason that I've kept those (simplified versions) is that I wanted to reduce duplication of code-logic in 3 different end-points (get_collection_items, post_collection_items and get_collection_coverage). I also think separating the parameters management and validation from the logic handled in the request methods themselves can become beneficial later. If this is really unwanted, I'll re-work it.

PostgreSQL provider

  • The main feature - Updated the _get_bbox_filter function to create an envelope using an optionally provided bbox_crs and re-project the bbox to the SRID of the underlying table when sending the query to retrieve the features. That way, one can send a bbox in another srid than the underlying table.

  • PostgreSQL provider: Support a bbox-crs parameter as part of the query function, default value being None

  • PostgreSQL provider: Store the schema value as a class attribute, instead of reading the value in the _reflect_table_model function from the self.db_search_path[0] attribute. Now, self.schema is defined as part of the initialization, right after initializing self.db_search_path which I thought would make sense. This way, self.schema can be used after in various functions including the new get_srid function.

  • PostgreSQL provider: New get_srid function which reads the SRID of the underlying table using geoalchemy2.

  • PostgreSQL provider: Store the srid of the underlying table in the class attribute self.srid similarily to how the fields are stored in self.fields

EDIT: Not sure why it's failing a GIT automated test. Is something wrong?

Alex-NRCan avatar Nov 09 '22 20:11 Alex-NRCan

@Alex-NRCan bit cheeky but as your working on the PostGres provider might you be able to look at https://github.com/geopython/pygeoapi/issues/1007 ?

KoalaGeo avatar Nov 15 '22 15:11 KoalaGeo

@Alex-NRCan bit cheeky but as your working on the PostGres provider might you be able to look at #1007 ?

Thanks for looping me in on this issue. I'll check if I can find the time to work on it. It's an interesting one and I can see the benefits of it :)

Alex-NRCan avatar Dec 07 '22 17:12 Alex-NRCan

@Alex-NRCan @tomkralidis I started working on #1128. That issue is about fully implementing the OGC Standard OGC API Features Part 2, to be called "Part 2" below. I see a strong overlap with this PR. I am in the phase of both understanding the Standard and what would be required to extend pygeoapi. So also seeking clarifications.

I studied the 3 files changed in this PR. I have some general and specific questions:

  • is it the intention of this PR to fully support the "Part 2" , or partly and only for the PostgreSQL Provider and bbox-crs, not crs?
  • this PR handles POST parameters. Is POST supported, by the OAPIF standard and pygeoapi? I can only find GET support in the OAPIF OGC Standard. How would a POST body look like? More practically: the handling of a POST Body may be done once and generically in pygeoapi, and provided as a params dict. It looks like the POST body is decoded for each 'get_param' like bbox and bbox-crs.
  • possibly all Providers will need to handle bbox-crs, or at least have some way to indicate that it is not supported. I was thinking of adding a base Provider method that could interrogate Providers for native support for both bbox-crs and crs. If the Provider does not support these facilities, a generic implementation in the "API handling" could take care of Transforms. But if a Provider has a Storage CRS that matches these, no transforms are required IMHO.
  • adding Requirements Clauses and supported CRSs in the metadata. The Part 2 standard provides support for bbox-crs, crs as parameter plus providing 'Storage CRS' as metadata. I am a bit confused (with Part 2) that appearantly the list of CRS applies to both bbox-crs and crs params.
  • my aim is to have sensible defaults, so existing pygeoapi deployments are not affected. The means would be via configuration.
  • we will need to extend the pygeoapi unit tests

Like said I am in an investigation phase, focus on Part 2, gaining understanding, noticing work on Coverage and EDR. We may converse further to clarify and possibly collaborate, building on each other's work. My deadline, is March 16 2023 to complete this work.

justb4 avatar Feb 07 '23 16:02 justb4

Thank you for your interest @justb4 !

Off the bat, I'm not familiar at all with the OGC Standard OGC API Features Part 2 :) That being said, I'll try to answer, the best I can, to some of the points you've made by focusing on what this PR is specifically about:

  • This PR focuses on providing a bbox-crs parameter only for the Postgres provider indeed. Please note that a previous PR #1012 that I'd created (and was accepted) focused on adding a bbox-crs for the rasterio provider. Also, I have to make it clear that I didn't come up with the bbox-crs. That parameter was already documented, as per the pygeoapi OpenAPI Swagger specs. I just couldn't make it work for our needs (in rasterio and postgresql providers) - so I've coded their implementations. I didn't work on adding the parameter to all providers and I don't know which other, if any, providers support it already.

  • This PR has code to retrieve the CRS of the data on the fly (see get_srid() method); so there's no need to have the CRS in the configuration file, for the Postgres provider. I personally preferred it to be dynamic instead of a configuration.

  • I did write some code in this PR to handle POST requests, because eventually, notably with CQL, we're going to need to send more data to query information than what's possible to send via a GET request. GET requests are limited, because they have to go through a URL and that URL has a fixed limit in length. Chris Little points this out a few comments above on this PR: https://github.com/geopython/pygeoapi/pull/1023#issuecomment-1285305533. I think they're working on adding POST in the specs, but don't quote me on that though, as I'm not an OGC expert. That said, right now, just for the bbox-crs, the POST requirements aren't necessary as it doesn't add much to the payload. However, like you pointed out, the code also serves (via APIRequest class (read_param(), read_bbox())) to centralize some functions in order to reduce the duplicated logic in methods like get_collection_items and post_collection_items. Originally, this PR had more of those (e.g.: it was also handling geometry-wkt and geometry-crs), but it got simplified so that another PR would implement CQL in the Postgres provider.

More information about our context at National Resources Canada: we are developing a tool for geospatial data extraction that uses pygeoapi to provide OGC API (Coverages and Features) services from our data. As we are limited in resources, our contribution effort to the pygeoapi development, at the moment, is limited to functionalities that are addressing our organisation needs but that can also benefits other pygeoapi users needs.

Alex-NRCan avatar Feb 07 '23 21:02 Alex-NRCan

@Alex-NRCan thanks for your clarifications! Your three points:

  • yes, I've seen the bbox-crs handling in the code for coverages. Was looking into some reuse. Though Coverages (and thus the rasterio provider) is another OGC standard than Features for which PostgreSQL provider applies.
  • ok, for your use case. But the Part 2 standard requires supported CRSs in the metadata
  • POST request: I agree, GET is limited in URL-length, even per-browser. But you are pointing to EDR, which is a different standard than Features, well IMHO. I just wonder how a POST Body will look like. A flat (Geo)JSON structure? Or is als "form-encoding" allowed?

Thinking out loud: best is, if that fits your timing, that I will implement #1128 first. Think this will cover much of this PR's functionality, at least at the generic API (api.py) level. POST support and PostgreSQL Provider code is then to be added. But then again, I will need a next level of understanding both the Standard and existing pygeoapi code...

justb4 avatar Feb 09 '23 14:02 justb4

As per RFC4, this Pull Request has been inactive for 90 days. In order to manage maintenance burden, it will be automatically closed in 7 days.

github-actions[bot] avatar Mar 10 '24 21:03 github-actions[bot]

As per RFC4, this Pull Request has been closed due to there being no activity for more than 90 days.

github-actions[bot] avatar Mar 31 '24 03:03 github-actions[bot]