ravendb `SINGLE_NODE` transactions appear unisolated, even on a single node

RavenDB's promotional materials repeatedly promise that RavenDB "offers ACID guarantees throughout your database cluster". ACID isolation implies Serializability: transactions must appear to execute sequentially, without interference from other transactions. However, it appears that transactions executed against a single, healthy RavenDB node routinely violate isolation. Lost update is common. Reads routinely observe incompatible orders of writes against single keys. RavenDB appears to violate Serializability, Repeatable Read, and Snapshot Isolation.

I've written a small Jepsen test for RavenDB to explore this behavior. It uses the official RavenDB Java client, at version 5.4.0, and installs the linux-x64 tarball of RavenDB 6.0.2 on a single Debian Bookworm node.

The sole workload for this test performs transactions over lists, each identified by a unique integer ID. Each transaction consists of reads and/or appends of unique integers to those lists. Each worker thread in the test opens a single DocumentStore connected to the same node. Each transaction creates a new session, perforrms reads and/or appends, then calls session.saveChanges to commit.

Reads are encoded as a single call to session.load(java.util.Map, id). Appends call session.load to read the current value, sff a unique integer to the end of the list, then call session.store(map, key).

Here's a test run from version b669402 of that test harness: 20231229T102201.960-0600.zip. You can reproduce this by building a Jepsen environment and running lein run test --nodes n1 --concurrency 2n --rate 10000 --time-limit 5. In five seconds, this test performed 12886 transactions over 975 keys (with access to keys exponentially distributed). 454 of those keys exhibited incompatible orders; 81 of them exhibited a provable lost update. 481 in total--over half--violated single-key isolation. For example, here are all the reads of key 116. Three seconds into the test, process 1 observed its state as [1 2 3]. However, an immediately following read by process 0 saw [1 2 4], and the write of three never appeared again. Process 1 then observed [1 2 4 5]: this write of 5 was also replaced by 6, and never seen again.

Screenshot from 2023-12-29 10-25-59

I've spent roughly seven hours in RavenDB's documentation trying to figure out what exactly it's supposed to do, and I honestly can't figure it out--the docs seem internally contradictory. I'd like to confirm--is RavenDB intended to offer ACID by default? Or are users supposed to do something special to, say, ask Raven not to lose updates?

Dec 29 '23 16:12 aphyr

Hi, I'm afraid that there is a mismatch in expectations here, with regards to transaction vs. session.

The RavenDB API you are using is the session, which implements the Unit of Work and Identity Maps patterns.

Re: https://github.com/jepsen-io/ravendb/blob/8315d6053bf022203a24db74812d2fdfe89b56b7/src/jepsen/ravendb/append.clj#L35

Note that this is in direct contrast to working with immutable records. You can probably set NoTracking option to allow better usage in your environment, but the underlying idea is that the API is intentionally mimicking the Hibernate's session API / JPA.

As for the actual transaction semantics you observed, please note that this is by design, and documented here: https://ravendb.net/docs/article-page/6.0/Csharp/client-api/session/configuration/how-to-enable-optimistic-concurrency

But in general, what RavenDB offers and what you are expecting are different. Crucially, RavenDB does not attempt to provide transactional semantics over the entire session, rather it provide transactions over individual requests.

The reasoning for this is:

Those sort of locks are expensive, with OLTP – Through the Looking Glass, and What We Found There (http://nms.csail.mit.edu/~stavros/pubs/OLTP_sigmod08.pdf) finding out that it can exceed 35% of the overall costs of the database
Not meaningful in a distributed database, given that concurrent transactions on different nodes will already not be in sync

Instead, RavenDB uses the optimistic concurrency model. This is off by default, mind, see the link above on how to enable that.

The idea is that you read from the server the documents (you can read multiple documents in either single request or multiple requests - note that multiple requests implicitly means separate transactions). Modify the data and then call to SaveChanges(). That operation sends all the results back to the server, and perform the transaction. All the changes will go there or none at all.

For concurrency, we use the change vector to ensure that you the previous version of the document matches the expected version of the document, throwing (and aborting the transaction) along the way.

Crucially, it is important that for RavenDB, we only consider transactions to be the calls to SaveChanges or other data mutation operations. When you are talking about sessions and the requests that they make, they are making independent separate transactions, unrelated to one another.

The idea is that this matches closely the usual interaction model in which RavenDB is used. You have one requests that loads the data and show that to the user, some think time, and then an set of updates comes from the user, which are then persisted to the database. Note that this model with any other database wouldn't exhibit transactional behavior over the entire interaction either.

All the changes that happened in that transaction are persisted in a single unit, and if you care to avoid lost updates, you need to ensure you use optimistic concurrency.

While the typical manner in which you'll use RavenDB is with full document updates, we also provide additional means to execute transactions that allows you more operations. If you want to just mutate a document in a transaction, JSON Patch allows you to do that (https://ravendb.net/docs/article-page/6.0/csharp/client-api/operations/patching/json-patch-syntax#additional-json-patching-options) RavenDB ensures that this is completely serial operation.

We also provide more complex operations (including running scripts): https://ravendb.net/docs/article-page/6.0/csharp/client-api/operations/patching/single-document#add-or-patch-to-an-existing-array

That allows you to read / make decisions / mutate document (or documents) within the scope of a single transaction.

A key design decision here is that the scope of a transaction in RavenDB cannot exceed a single client request. The client API allows you to aggregate many such operations into a single request, which is then executed as a single transaction.

Dec 31 '23 10:12 ayende

But in general, what RavenDB offers and what you are expecting are different. Crucially, RavenDB does not attempt to provide transactional semantics over the entire session, rather it provide transactions over individual requests.

Begging your pardon, but RavenDB's documentation appears clear about this: sessions are transactions. It's front-and-center on the cluster transaction documentation:

A session represents a single business transaction.

And at the top of the session documentation:

The Session, which is obtained from the Document Store, is a Unit of Work that represents a single business transaction on a particular database.

Which goes on to say:

The batched operations that are sent in the SaveChanges() will complete transactionally. In other words, either all changes are saved as a Single Atomic Transaction or none of them are. So once SaveChanges returns successfully, it is guaranteed that all changes are persisted to the database.

Similarly, the Transaction Support page states that "All actions performed on documents are fully ACID.... All of these constraints are ensured when you use a session and call SaveChanges."

Crucially, it is important that for RavenDB, we only consider transactions to be the calls to SaveChanges or other data mutation operations.

Begging your pardon, but... really? RavenDB transactions don't cover reads? That... feels like something that should be really prominent in your documentation.

Dec 31 '23 19:12 aphyr

I agree, the term business transaction is a confusing here. It is something that we took from Fowler's description of a Unit of Work:

Maintains a list of objects affected by a business transaction and coordinates the writing out of changes and the resolution of concurrency problems.

This is terminology that relates more to the session as it is used in a business application, not in terms of the ACID transactions.

The documentation about transaction should also be improved, I guess, see:

a batch of operations applied to a set of documents sent in a single HTTP request will execute in a single transaction.

We are saying what is a transaction, but we should probably also say what isn't a transaction. And the manner in which you are using the API isn't what we expect you to. Note also that this is a high level API meant to be used from a business application, and it makes assumptions about its usage.

In general, transactional behavior with RavenDB is divided into two modes:

Use an actual transaction and perform all your operations as a single request. This is possible and supported, but is generally not something that most users would do. It doesn't match typical use case for our users. There are several ways to do that, either by batch operations, running a script, etc.
Transactional behavior over multiple requests. For the rest of my answer, I'm assuming that this is the scenario that we are talking about.

Using Hibernate's terminology, one of the most interesting use cases for transactional behavior is (Long Conversation)[https://docs.jboss.org/hibernate/core/3.3/reference/en/html/transactions.html#transactions-basics-apptx] or Application Transaction, as they also call it.

The first screen of a dialog opens. The data seen by the user has been loaded in a particular Session and database transaction. The user is free to modify the objects.

The user clicks "Save" after 5 minutes and expects their modifications to be made persistent. The user also expects that they were the only person editing this information and that no conflicting modification has occurred.

The situation is similar if we have everything happening in a single web request. We always assume that the long conversation mode is in place, which allows us to do a bunch of optimizations and greatly simplify our code and behavior.

In the long conversation model, it is obvious that you cannot actually allocate a server side transaction and hold it while the user is off to get a cup of coffee. We need to handle the scenario without it.

RavenDB does use using optimistic concurrency, and will check that any modified document was changed since the time it was loaded into the session. We also support a way to roundtrip the version of the document that was loaded through the browser, so there is no state that you need to keep track of in your application.

The basic model is:

// load documents
// modify the documents
// save changes

The modify document portion in the middle can take a while, happen offline, etc. The key is that when we save the changes, we also send the version of the documents that we modified, so the server can check if they were changed. If they did change, we'll abort the transaction with a ConcurrencyException and the caller can retry or handle the error how they see fit.

The same basic logic applies to both single node and cluster wide transactions. With the case of single node transactions, the version we keep track of is a change vector (that is updated independently by each node and used to also solve conflicts). In the case of a cluster wide transaction, we maintain a cluster-wide version for each modified document and update that through Raft protocol.

Jan 02 '24 14:01 ayende

We have updated our documentation and provided more clarifications when talking about transactions in RavenDB and the Client API Session.

The actual changes we made can be reviewed here: https://github.com/ravendb/docs/pull/1766.

Feb 28 '24 11:02 arekpalinski

ravendb ravendb copied to clipboard

`SINGLE_NODE` transactions appear unisolated, even on a single node

ravendb
ravendb copied to clipboard