couchdb
couchdb copied to clipboard
ETags support for views
Description
We have been looking for support for ETags similar to what individual docs support using the If-None-Match
header.
Currently, to simulate a 304 Not Modified
behavior, we're calling views using these args,
ping_args = { "stable": "true", "update_seq": "true", "limit": "0" }
and then, using the returned updated sequence adjoined with the view name to create a key for caching, and returning the cached result unless the updated sequence has changed.
Since the docs imply that views do support Etags, it's just confusing to us as to the position of the CouchDB team on this issue
It'd be nice if this issue gets bumped !
Expected Behaviour
Updating the docs to represent the current support of ETags by views or, fixing ETags support for views.
Your Environment
- CouchDB version used: 3.2.1
- Operating system and version: Ubuntu 20.04
I recall that back in the 1.x release series we did have full support for ETags in views. The clustering system we introduced in 2.0 complicated efforts to calculate an efficient ETag, and I think we maybe never really resurrected it. I'm honestly not sure off the top of my head what parts of that code path are still functioning. Are we generating an ETag but ignoring any If-None-Match?
I'd be comfortable with the idea of merging the DB sequences for each of the shard replicas that contributed to the view response as the ETag. If a client supplied that ETag in an If-None-Match header we could implement a reasonably efficient check to see if any of shards have advanced their sequence.
In contrast to document and attachment ETags it would be a net loss to include an If-None-Match on every request to a DB with a high update rate, but if you've got a deployment with a low / infrequent update rate the authoritative caching would still be a nice win.
I experimented a bit but with no avail, never received back any ETags.
Yeah, I think the mechanisms already exist in the codebase that can enable this sort of caching, just need to enable them I guess.
It's a nice win if you have a large view that pulls a lot of data (MBs), a simple ETag check can save lots of network time and the expense of such checks is presumably very low, the net-loss is therefore minimal even if we added the If-None-Match
header to every request.
Can you describe how would one go about fixing this ?
Here's a sketch. It's unfortunately kind of scattered all over the codebase, but I would look to do something like this on the 3.x
branch:
- Check for the presence of an
If-Match
header containing a DB sequence on the incoming request inchttpd_view
- Pass that sequence into
fabric:query_view/7
if it's present - Inside
query_view
callfabric:changes/4
passing in the sequence from the header with limit=1 and see if any rows are returned. - If no rows are returned, invoke the
Callback
function with a new message likecache_hit
instead of dropping down into thefabric_view_map
orfabric_view_reduce
coordinators. - Chase down all the callback functions in the
chttpd
app (possibly justcouch_mrview_http:view_cb/2
) to add a clause to handlecache_hit
and return 304 when that happens.
Step 3 is really the key extra bit of logic that validates the freshness of the ETag. Of course we also have to generate the ETag inside fabric_view_map
/ fabric_view_reduce
and send it with every respoonse. Here I'd be looking for something like
- Inside the
fabric_view
stack, make it so thatupdate_seq = true
is the only mode of operation, which will ensure thatcouch_mrview:make_meta/3
always includes the update sequence in the view metadata sent from each shard. We should verify that the sequence here is really the sequence of the view index and not the database. I'm pretty sure that's the case. - Update all the callback functions (again, maybe just
couch_mrview_http:view_cb/2
?) to extract theupdate_seq
from the{meta, Meta}
message and report it in the ETag header. This will require a small bit of refactoring as the code currently blindly starts a response with no extra response headers.
The work to do this on main
is going to be fairly similar but not so simple as a cherry-pick
.
Was surprised by this when upgrade from v1 to v3. Docs indicate that view support is there:
Etag The Etag HTTP header field is used to show the revision for a document, or a view. ETags have been assigned to a map/reduce group (the collection of views in a single design document). Any change to any of the indexes for those views would generate a new ETag for all view URLs in a single design doc, even if that specific view’s results had not changed. Each _view URL has its own ETag which only gets updated when changes are made to the database that effect that index. If the index for that specific view does not change, that view keeps the original ETag head (therefore sending back 304 - Not Modified more often).
https://docs.couchdb.org/en/3.1.1/api/basics.html