couchdb icon indicating copy to clipboard operation
couchdb copied to clipboard

ETags support for views

Open adnaanbheda opened this issue 3 years ago • 4 comments

Description

We have been looking for support for ETags similar to what individual docs support using the If-None-Match header. Currently, to simulate a 304 Not Modified behavior, we're calling views using these args,

ping_args = { "stable": "true", "update_seq": "true", "limit": "0" }

and then, using the returned updated sequence adjoined with the view name to create a key for caching, and returning the cached result unless the updated sequence has changed.

Since the docs imply that views do support Etags, it's just confusing to us as to the position of the CouchDB team on this issue

It'd be nice if this issue gets bumped !

Expected Behaviour

Updating the docs to represent the current support of ETags by views or, fixing ETags support for views.

Your Environment

  • CouchDB version used: 3.2.1
  • Operating system and version: Ubuntu 20.04

adnaanbheda avatar Jan 25 '22 07:01 adnaanbheda

I recall that back in the 1.x release series we did have full support for ETags in views. The clustering system we introduced in 2.0 complicated efforts to calculate an efficient ETag, and I think we maybe never really resurrected it. I'm honestly not sure off the top of my head what parts of that code path are still functioning. Are we generating an ETag but ignoring any If-None-Match?

I'd be comfortable with the idea of merging the DB sequences for each of the shard replicas that contributed to the view response as the ETag. If a client supplied that ETag in an If-None-Match header we could implement a reasonably efficient check to see if any of shards have advanced their sequence.

In contrast to document and attachment ETags it would be a net loss to include an If-None-Match on every request to a DB with a high update rate, but if you've got a deployment with a low / infrequent update rate the authoritative caching would still be a nice win.

kocolosk avatar Jan 30 '22 03:01 kocolosk

I experimented a bit but with no avail, never received back any ETags. Yeah, I think the mechanisms already exist in the codebase that can enable this sort of caching, just need to enable them I guess. It's a nice win if you have a large view that pulls a lot of data (MBs), a simple ETag check can save lots of network time and the expense of such checks is presumably very low, the net-loss is therefore minimal even if we added the If-None-Match header to every request.

Can you describe how would one go about fixing this ?

adnaanbheda avatar Feb 03 '22 10:02 adnaanbheda

Here's a sketch. It's unfortunately kind of scattered all over the codebase, but I would look to do something like this on the 3.x branch:

  1. Check for the presence of an If-Match header containing a DB sequence on the incoming request in chttpd_view
  2. Pass that sequence into fabric:query_view/7 if it's present
  3. Inside query_view call fabric:changes/4 passing in the sequence from the header with limit=1 and see if any rows are returned.
  4. If no rows are returned, invoke the Callback function with a new message like cache_hit instead of dropping down into the fabric_view_map or fabric_view_reduce coordinators.
  5. Chase down all the callback functions in the chttpd app (possibly just couch_mrview_http:view_cb/2) to add a clause to handle cache_hit and return 304 when that happens.

Step 3 is really the key extra bit of logic that validates the freshness of the ETag. Of course we also have to generate the ETag inside fabric_view_map / fabric_view_reduce and send it with every respoonse. Here I'd be looking for something like

  1. Inside the fabric_view stack, make it so that update_seq = true is the only mode of operation, which will ensure that couch_mrview:make_meta/3 always includes the update sequence in the view metadata sent from each shard. We should verify that the sequence here is really the sequence of the view index and not the database. I'm pretty sure that's the case.
  2. Update all the callback functions (again, maybe just couch_mrview_http:view_cb/2 ?) to extract the update_seq from the {meta, Meta} message and report it in the ETag header. This will require a small bit of refactoring as the code currently blindly starts a response with no extra response headers.

The work to do this on main is going to be fairly similar but not so simple as a cherry-pick.

kocolosk avatar Feb 04 '22 01:02 kocolosk

Was surprised by this when upgrade from v1 to v3. Docs indicate that view support is there:

Etag The Etag HTTP header field is used to show the revision for a document, or a view. ETags have been assigned to a map/reduce group (the collection of views in a single design document). Any change to any of the indexes for those views would generate a new ETag for all view URLs in a single design doc, even if that specific view’s results had not changed. Each _view URL has its own ETag which only gets updated when changes are made to the database that effect that index. If the index for that specific view does not change, that view keeps the original ETag head (therefore sending back 304 - Not Modified more often).

https://docs.couchdb.org/en/3.1.1/api/basics.html

wavded avatar Jul 14 '22 18:07 wavded