couchdb Prototype/fdb layer db version as vstamps

Overview

The caching logic was quite complicated and some of the underlying assumptions turned out to be not quite right. Originally we had planned to condition db handles mainly on the metadataVersion key. However, it turns out invalidating every db handle whenever any design doc in the cluster is updated was a bit too "thundering herd"-ish. This changes the primary cache to be the db_version key that's kept for each database.

We've also changed the db_version to be a version stamped value. This means we can reuse it for protecting cache updates in fabric2_server which is included as part of this PR.

Testing recommendations

$ make check

Checklist

[x] Code is written and works correctly
[x] Changes are covered by tests
[x] Any new configurable parameters are documented in rel/overlay/etc/default.ini
[ ] A PR for documentation changes has been made in https://github.com/apache/couchdb-documentation

Jun 17 '20 15:06 davisp

I need to read this in detail, but I don't understand how the system is supposed to work from your description.

Jun 17 '20 18:06 kocolosk

Previously, when the md version was stale we should have only revalidated each db_version (a single key read) to check that particular db hasn't changed. And if it hasn't, we would re-insert it into the cache with the new md version.

https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/fabric/src/fabric2_fdb.erl#L1873-L1879

All the subsequent requests should then get an updated db handle with the new md version, which they'll see as current and not do the extra db version key read.

But we shouldn't have fully reopened the db if a particular db's version hasn't changed.

Jun 17 '20 18:06 nickva

@nickva That's true. I was thinking about that invalidation a bit wrong cause that logic is a bit nutty.

@kocolosk Sorry, i guess I was a bit terse there. The basic change here is that I've switched the db_version from being a UUID to a versionstamp so that its comparable for use by fabric2_server to avoid the metadata version aspect since metadataVersion changes quite a bit more frequently in practice than expected.

Along with that change I switched from using metadataVersion to the db version as the primary cache invaldiation mechanism. But then @nickva reminded me that it basically only devolves to this point so I'm gonna go back and undo that.

It turns out the old caching bits were also a bit complicated so a lot of that has been straightened out as well to simplify things.

Jun 17 '20 18:06 davisp

OK that helps a lot. I can still go and review the PR, but the logic makes sense.

Jun 17 '20 18:06 kocolosk

@kocolosk @nickva So two things. I've gone back and changed to using the \xffmetadataVersion as the primary cache synchronization but that brings up two more questions:

https://github.com/apache/couchdb/blob/9425a6408274edc1713b990729d4dea4daa4787d/src/fabric/src/fabric2_fdb.erl#L1332-L1336

The first thing I realized is that by using the centralized metadataVersion, I think we're basically causing all in-flight write transactions to retry. I'll try and write a test case to prove that we can conflict on that key, but that also means that any time anyone updates a design doc (or creates or deletes a database) we're causing some amount of work to be retried. I can't decide if one KV lookup on db_version for every transaction is more or less work in that case.

https://github.com/apache/couchdb/blob/9425a6408274edc1713b990729d4dea4daa4787d/src/fabric/src/fabric2_fdb.erl#L1384-L1397

The second thing I noticed is that metadataVersion is also supposed to protect between different layers which means we should be going back all the way to the directory layer to check that our directory hasn't been renamed and what not. That means each time the metadataVersion key we're now adding an extra ~5-10 KV lookups in those cases. I've added that as a commit on this PR but mostly for the conversation.

Jun 25 '20 16:06 davisp

The first thing I realized is that by using the centralized metadataVersion, I think we're basically causing all in-flight write transactions to retry

We are doing md version reads at the snapshot level isolation https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/fabric/src/fabric2_fdb.erl#L1273 so don't think they should retry. But it is worth checking if it is not too difficult.

we should be going back all the way to the directory layer to check that our directory hasn't been renamed and what not. That means each time the metadataVersion key we're now adding an extra ~5-10 KV lookups in those cases.

That is legit. Good point. I don't think we technically have any place where we programmatically bump the md version when directory layer updates (we'd do it manually), but still for correctness and completeness we should revalidate it. We already do it for full re-opens (when db version and md versions are stale: https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/fabric/src/fabric2_fdb.erl#L134) so we should do it on md changes as well.

Jun 27 '20 16:06 nickva

couchdb couchdb copied to clipboard

Prototype/fdb layer db version as vstamps

Overview

Testing recommendations

Checklist

couchdb
couchdb copied to clipboard