pranadb
pranadb copied to clipboard
Out of band pull queries
This PR enables out of band pull queries. Previously we were executing all pull queries through raft as linearizable reads. This didn't buy us anything as our data is effectively stale anyway with respect to published events.
From slack:
Currently we execute pull queries through raft as linearizable reads. That means the results of the read reflects any writes that completed before the read executed. There’s quite an overhead to doing this, as all linearizable reads have to go through the master and lock the state machine while the read is in progress - for a query returning 10000 rows (the default value of batch size) this could a significant amount of time (likely to be multiple ms). When an event is published to a Kafka topic, it’s only some time later that possibly many materialized views are updated, asynchronously. Internally this can be composed of multiple Raft writes. So having a guarantee of linearizable raft reads with respect to raft writes doesn’t really buy us anything. All of our reads are effectively stale anyway, with respect to the publishing of the event to the topic. I propose we can retain the same level of consistency (stale reads) without executing reads through Raft. Instead we can execute reads directly against the state machines of any raft replicas. Raft doesn’t apply changes to state machines until the corresponding commands have been committed in the leader’s log, so we know if we see the data in the state machine of any replica then it’s not uncommitted data, it is, at worst, stale. This is ok, because our data is stale anyway with respect to the publishing of events even if it is read via raft as a linearizable read. Executing reads directly against replicas (aka “out of band”) should allow us to reach higher levels of performance.