gbif-api
gbif-api copied to clipboard
Support very long (≥8k) search requests
Users would like to search using long polygons or many taxon keys. To support this with the current search API, a long (>8k character) URL must pass through:
- The user's web browser or other client
- Potentially a not-very-good proxy (corporate or education filter etc)
- Varnish
- a. occurrence-ws b. vectortile-ws / mapnik-server
- SOLR
4.a. is easily fixed for gbif-microservice, 4.b. can be fixed for Dropwizard with
applicationConnectors:
- type: http
port: 7001
maxRequestHeaderSize: 1MiB
although there are then issues somewhere in Jersey's regex handling.
-
is probably OK since using HTTPS should avoid most proxies from modifying the request
-
requires regexes in Varnish to use
.*?
rather than.*
for the maps rules, and there's a related note in Varnish saying needing this is "madness".
That leaves 1. That's a concern from a Jetty developer suggesting all of this is a bad idea, for compatibility and security.
So we need some way to communicate the search terms without using >8kiB, at least for website and API. We could:
- Use POST and cache POST requests in Varnish
- No longer possible to share URLs, easily switch between website and API, etc, but the reasonable length limit is very high
- Use POST requests to get a key (time limited?), presumably stored in a database somewhere, which maps the key to the search string.
- Allows sharing etc, but adds more complexity
- Compress the search parameters in the URL:
- xz compression followed by Base64 encoding reduces a 11.5kiB string (polygon of Brandenburg) to 2.5kiB
- not an incredible saving
- Make a protocol buffers format for encoding the query
- A quick try, using part of geobuf.proto for the geometry then base64 encoding, uses 5.8kiB
- even less of a saving
I too have looked at c and d and concluded it wouldn't work for us. I ended up thinking that b was the only reasonable way to do so:
- Use plain linkable urls for small queries
- Switch to temporary tokens (stored in a tmp database) for larger queries.
- And have an option to click
get shareable link
and that link will then persist.