gbif-api icon indicating copy to clipboard operation
gbif-api copied to clipboard

Support very long (≥8k) search requests

Open MattBlissett opened this issue 6 years ago • 1 comments

Users would like to search using long polygons or many taxon keys. To support this with the current search API, a long (>8k character) URL must pass through:

  1. The user's web browser or other client
  2. Potentially a not-very-good proxy (corporate or education filter etc)
  3. Varnish
  4. a. occurrence-ws b. vectortile-ws / mapnik-server
  5. SOLR

4.a. is easily fixed for gbif-microservice, 4.b. can be fixed for Dropwizard with

  applicationConnectors:
    - type: http
      port: 7001
      maxRequestHeaderSize: 1MiB

although there are then issues somewhere in Jersey's regex handling.

  1. is probably OK since using HTTPS should avoid most proxies from modifying the request

  2. requires regexes in Varnish to use .*? rather than .* for the maps rules, and there's a related note in Varnish saying needing this is "madness".

That leaves 1. That's a concern from a Jetty developer suggesting all of this is a bad idea, for compatibility and security.

So we need some way to communicate the search terms without using >8kiB, at least for website and API. We could:

  • Use POST and cache POST requests in Varnish
    • No longer possible to share URLs, easily switch between website and API, etc, but the reasonable length limit is very high
  • Use POST requests to get a key (time limited?), presumably stored in a database somewhere, which maps the key to the search string.
    • Allows sharing etc, but adds more complexity
  • Compress the search parameters in the URL:
    • xz compression followed by Base64 encoding reduces a 11.5kiB string (polygon of Brandenburg) to 2.5kiB
    • not an incredible saving
  • Make a protocol buffers format for encoding the query
    • A quick try, using part of geobuf.proto for the geometry then base64 encoding, uses 5.8kiB
    • even less of a saving

MattBlissett avatar Nov 26 '18 15:11 MattBlissett

I too have looked at c and d and concluded it wouldn't work for us. I ended up thinking that b was the only reasonable way to do so:

  • Use plain linkable urls for small queries
  • Switch to temporary tokens (stored in a tmp database) for larger queries.
  • And have an option to click get shareable link and that link will then persist.

MortenHofft avatar Aug 14 '19 08:08 MortenHofft