ogcapi-features icon indicating copy to clipboard operation
ogcapi-features copied to clipboard

Support response that returns only the ids of the selected features

Open cportele opened this issue 7 years ago • 14 comments

The following was implementation feedback from using an implementation:

If a client already has a copy of the relevant features locally, a query to a feature collection would only need to return the feature ids, not the whole features. This can reduce network load and improve application performance.

This could be implemented in several ways. One option could be to use resultType=ids. However, to be consistent with the media types, this would still need to be a feature collection (but without geometries or other properties). In that sense it would be a special projection case and could also be handled in the extension that will cover the projection capability.

cportele avatar Nov 13 '17 14:11 cportele

I wonder if there's a way to do this that's more compatible with web architectures?

I'm not sure of the exact mechanism, but something with etags / cache-control feels more in line than some special parsing of id's.

Perhaps instead of returning id's the service could return links to the canonical feature location? Like the /buildings/{fid} endpoint, that would return the full feature? Do we require a 'self' link on each feature returned in the collection? Looking at the draft I see 'self' talked about it in the collection.

If we returned the self link then clients could cache those, and then just check if it has resolved it.

Then it is theoretically still a featurecollection, just made of links that haven't been resolved. But I'm not sure if any of the formats actually supports that.

cholmes avatar Dec 15 '17 21:12 cholmes

When we talk about a self link, do we mean implementing hypermedia properties in the resource? The Draft mentions hypermedia in section "6. Overview" and I suppose this is what it is defined in the "Connectedness" requirement. Am I correct?

If I understood well the issue, in theory once the client requests objects containing only ids, the response can be (in GeoJSON format), as pointed above:

{
"type": "FeatureCollection",
"features": [
     { 
         "type": Feature,
         "ref": "/buildings/<id>"
   }, ... ]
}

But I don't know if this can be considered compliant with GeoJSON RFC or how other supported formats handle hypermedia.

Something that may be helpful as a reference about using hypermedia for WFS: there is a group to develop a standard for REST APIs that are natively hypermedia-driven called Hydra, W3C Draft here. It is aimed to Semantic Web applications (it is an extension of W3C's RDF) though.

As a WFS APIs user/developer, thanks for starting this Draft!

Mec-iS avatar Dec 18 '17 14:12 Mec-iS

@cholmes said: I wonder if there's a way to do this that's more compatible with web architectures? I'm not sure of the exact mechanism, but something with etags / cache-control feels more in line than some special parsing of id's.

So something like this?

(example URLs from https://www.ldproxy.nrw.de )

> GET /kataster/VerwaltungsEinheit?f=json&art=Gemeinde&bbox=7.0%2C50.6%2C7.2%2C50.8&count=20 HTTP/1.1
>
< HTTP/1.1 200 OK
< ETag: "686897696a7c876b7e"
< ... data ...

Then later...

> GET /kataster/VerwaltungsEinheit?f=json&art=Gemeinde&bbox=7.0%2C50.6%2C7.2%2C50.8&count=20 HTTP/1.1
> If-None-Match: "686897696a7c876b7e"
>
< HTTP/1.1 304 Not Modified

Where 686897696a7c876b7e would be entirely implementation dependent and opaque to the client. A simple implementation could be along the lines of the last-update-time of the entire VerwaltungsEinheit (admin units) dataset. And obviously the provider could make it more complex if they wished. The client would store it to return with an If-None-Match header in a future request.

Like most caching, the goal would be to have no false positives (a 304 response when data has actually changed) rather than no false negatives (a 200+data response when data hasn't changed).

rcoup avatar Dec 18 '17 15:12 rcoup

@Mec-iS - yes, I was imagining something along those lines, where the response could just be the ref's. But I share your question on whether that is compliant with GeoJSON, etc.

@rcoup - What you proposed is super interesting, and not what I was thinking. My thought was simpler, and not thought about, but more along @Mec-iS's lines - somehow get each 'feature' cached. The usage pattern in my head was more like a client issues some big query and caches all the data in that, and then is sending queries that are likely subsets. So it can request its sub-set query, and then check their local cache.

But I like yours too. It seems to be more of a use case where a client is staying in sync with a full query. I see a lot of potential for an extension that is all about keeping two (or more) catalogs in sync, for like offline / low bandwidth use cases.

It does occur to me we should probably pull back one level and decide what we're trying to accomplish. What is the practical problem we are solving? Is it performance of the overall service? Is it efficiency in low bandwidth situations? Like I worry we might be introducing a lot of development complexity for a relatively small win. Like #14 with a protobuf might be a simpler way to reduce network load and increase application performance for most use cases.

cholmes avatar Dec 31 '17 17:12 cholmes

@cholmes is this the same issue discussed at https://github.com/opengeospatial/WFS_FES/issues/16 but just asking for fields=id? (ie. excluding all the other fields)

rcoup avatar Jan 24 '18 14:01 rcoup

Yeah, that could accomplish it most likely. Would just want to be sure we're getting all the cache control headers right so the client can query right. Though I'm still not 100% sure of the problem we're trying to solve.

cholmes avatar Jan 24 '18 16:01 cholmes

somehow get each 'feature' cached. The usage pattern in my head was more like a client issues some big query and caches all the data in that, and then is sending queries that are likely subsets. So it can request its sub-set query, and then check their local cache.

This sounds to me something in the same concept as requesting a lock in the current WFS implementation but applied to querying. The user request some kind of "pre-operation" action before performing on the data an actual "action" (in the case of locking, lock the feature to update it; in the case of querying, defining a "scope" for its future querying needs). Am I understanding correctly what is meant? In this case, under-the-hood, it sounds to me like what happens with chained filters in SQL ORMs, in pseudocode:

query = Class.all_objects().filter(Class.prop1=some_value).filter(Class.prop2=some_value)

Usually the ORM lazy-loads some pages from Class-type objects and then applies sequential filters to the pagination.

Mec-iS avatar Jan 28 '18 13:01 Mec-iS

How about a hypermedia/linked data use case ---- A user has discovered our WFS and wants to explore it as hypermedia. So they issue a request for everything using a resultType of "ids". This request returns an RDF graph consisting of feature nodes (name, id, and type only) and the associations between them. The user can then visualize the collection using graph visualization tools and identify the portion of the collection they are interested in. At this point they can:

  1. click on a node to retrieve the full feature
  2. click on a node and use it as the root node for a new resultType = "ids" request
  3. submit a getFeature() request.

cmheazel avatar Jan 30 '18 15:01 cmheazel

Use case #2 Users in two enclaves would like to exchange information but they are not allowed (or lack the bandwidth) to share Feature data between the enclaves. However, since a UUID is small and contains no information, those can be exchanged. As long as both enclaves have copies of the same feature data, feature identifiers are sufficient for them to collaborate. Think Geopackage and Context documents.

Note: the users are not accessing their data from the same server. Therefore, a URL will not work. We should include a standard path for accessing resources by ID. Given this path, the UUID, and the server URL, we have everything we need to access that resource from any WFS 3.0 server.

cmheazel avatar Jan 30 '18 15:01 cmheazel

As most of you may know, W3C has a geospatial framework, I couldn't find though any RDF vocabulary giving complete coverage to this subject. Anybody knows some link I have missed for a stable geospatial RDF vocabulary? I followed the implementation of an experimental server for hypermedia APIs, based on W3C-RDF, that can support this use cases by leveraging HYDRA (see my previous comment above). I think some inspiration can be taken from there and from these use cases.

[EDITED] Link added

Mec-iS avatar Feb 01 '18 14:02 Mec-iS

Agreement on 2018-02-01: Defer to an extension. (But keep the discussion going.)

cportele avatar Feb 01 '18 15:02 cportele

A resource that may be relevant: http://json-schema.org/ At the moment it is just an IETF draft but interesting. In the "core" spec It proposes a MIME for JSON schemas application/schema+json and it also an extension "JSON Hyper-Schema" for

annotating JSON documents with hyperlinks.

Mec-iS avatar Feb 18 '18 16:02 Mec-iS

This is a fundamental part of data in the web and therefore should not be deferred to extensions.

akuckartz avatar Sep 18 '18 06:09 akuckartz