Broken pagination for features
Description When querying a features collection which should return a result with more features than the limit, there appears to be no "next" link in the result to get the next page. Additionally, following the "json" link removes the "offset" parameter from the query and goes to the first page of results.
Steps to Reproduce This is visible in the demo instance: https://demo.pygeoapi.io/master/collections/dutch_castles/items?limit=10&offset=20
And also in the corresponding json (with offset parameter manually added): https://demo.pygeoapi.io/master/collections/dutch_castles/items?f=json&limit=10&offset=20
And same for the stable version: https://demo.pygeoapi.io/stable/collections/dutch_castles/items?limit=10&offset=20 https://demo.pygeoapi.io/stable/collections/dutch_castles/items?f=json&limit=10&offset=20
Notice how there is a "prev" link to the previous page but no way to go to the next page.
Expected behavior All pages except the last should contain a "next" link. The "json" link should keep all the same parameters as the original query.
Screenshots/Tracebacks
Environment
- pygeoapi version: 0.17.dev0 and 0.16.1
https://github.com/geopython/pygeoapi/blob/6c31a8e371d10750f265357bab62b45bfa9abc8f/pygeoapi/api/itemtypes.py#L532-L533
numberMatched is provider-dependent.
I see that the ogr provider computes numberMatched when asking for the hits (does it make sense to generate a "next" link for the hits?):
https://github.com/geopython/pygeoapi/blob/6c31a8e371d10750f265357bab62b45bfa9abc8f/pygeoapi/provider/ogr.py#L597-L601
This is the reply for the records:
https://github.com/geopython/pygeoapi/blob/6c31a8e371d10750f265357bab62b45bfa9abc8f/pygeoapi/provider/ogr.py#L550-L553
The main issue in using layer.GetFeatureCount() is that could kill performances in some cases, but we do not need to use that just to know if there are more records: we can increment numMatched if there is more, so that the next link is generated.
numberMatched (and numberReturned for that matter, even though it is mostly a function of len(features])) are optional properties of an OGC API - Features /collections/{collectionId}/items response. numberMatched may be expensive to calculate in some cases, which is why it is optional (some search engines exhibit the same behaviour).
Having said this, prev/next links are also optional parts of an OGC API - Features /collections/{collectionId}/items response.
As a result, providers are not required to implement paging, however it is great if they can do so efficiently.
@frafra what is the status of #1662?
Dug into this...
To my knowledge paging used to work, at least for the OGR Provider with for example WFS (mapped to WFS v2 Paging). This can be activated with source_capabilities: {paging: True} (default is False) in the OGR Provider specific config like in the demo config. It may have worked for all Providers possibly because handled at pygeoapi API level (?).
...digging into older versions of the repo....
I found in an early version ofapi.py (now split and for Features implemented in api/itemtypes.py), after thequery() call:
if len(content['features']) == limit:
next_ = offset + limit
content['links'].append(
{
'type': 'application/geo+json',
'rel': 'next',
'title': 'items (next)',
'href': '{}?offset={}{}'
.format(
uri, next_, serialized_query_params)
})
So the 'next' link is generated when a 'full page' (limit) of Features is returned. This is a bit of an heuristic, assuming the last page will have less than limit features. The only edge case is when the Collection contains N * limit number of Features. Ok, then the 'next' link will render zero Features. Somehow the above has been replaced with the current:
if 'numberMatched' in content:
if content['numberMatched'] > (limit + offset):
next_ = offset + limit
next_href = f'{uri}?offset={next_}{serialized_query_params}'
content['links'].append(
{
'type': 'application/geo+json',
'rel': 'next',
'title': l10n.translate('Items (next)', request.locale),
'href': next_href
})
which relies on numberMatched and is only supported by a few providers.
Like @frafra indicates, also IMO, we do not need the size of all (matched) records in the collection, only if there is a 'next' Feature, thus page. But also this will not apply to all providers.
We could also bring back the original heuristic. I would be in favor as there is no extra performance cost.