libsql
libsql copied to clipboard
Proposal: On interactive connection, timeouts, and connection conccurency
I propose changing how the hrana protocol handles interactive connections.
How it currently works
Hrana-over-HTTP has become the de facto protocol to interact with sqld. When a request is received by hrana, a Hrana Stream
is created. This Stream
owns an SQLite connection to execute queries sent to that stream. A Stream
is identified with a baton.
When a stream is opened, the baton
for that stream is returned to the client. The client can pass the baton to subsequent requests so that they are executed on the stream identified by that baton. When clients want to dispose of that stream, they emit a Close
command on that stream, which is closed, freeing the SQLite connection in the process. To avoid keeping streams open forever, a watchdog is associated with each stream if a client fails to complete it. If the watchdog fails to be kicked (by receiving a request on that stream), then, after a defined period (10s currently), the stream is expired, and the SQLite connection is released.
Transactions also have a timeout mechanism that is separate from the stream. Transactions are given a (configurable, default to 5s) timeslot to complete. Failing to do so yields a timeout error. The lock-stealer ensures that the transaction lock is delivered to another waiting connection, if any.
Motivation
This change is motivated by multiple factors:
- Resource management: Each interactive connection holds resources during its lifetime. The incentive for sqld is to keep resource usage low. Therefore, the protocol should be designed in such a way that minimizes resource squatting.
- We do see a lot of
hrana stream expired
in prod, which leads to user errors. That is not desirable, and we must address that somehow. - The current protocol expects that the client will be well-behaved and close their stream when they are done with it, which I believe is a fundamental design flaw, for clients, by nature, need to be trusted.
- The current model limits the transaction duration, which limits the users, especially when evaluating/using the shell to interact with the database.
Solution
- Default to non-interactive transaction: On the hrana side, only return keep a stream open if the connection is not in an auto-commit state (i.e., the connection is currently in a transaction state). In other words, if the connection is not in a transaction state after executing a query, the default behavior is to close it. In that case, no baton is returned, and the resources are released. On subsequent queries, a new connection will be opened. This solves the following issues:
- A connection is by default not used for longer than necessary, alleviating resource contention on the server
- The client is less likely to hold to a stream for too long, reducing the risk that their stream expires.
- The interactive transaction issue is solved straightforwardly: if a transaction is still opened, a baton is returned, and the client uses it for subsequent queries. No transaction => no baton => no problem.
- This API is less likely to be misused by a client not closing the streams properly
It may be desired cases. I'll use interactive connections outside of a transaction, but it would be an opt-in rather than an opt-out as it is today.
- Hard and soft transaction limits The current transaction timeout is a hard timeout. It is enforced in two ways:
- The lock stealer: if a transaction runs too long, and another transaction is waiting for the lock, the waiting transaction can steal the lock and timeout the long-running transaction. I call that a soft timeout, as if no other transaction is trying to acquire the lock, a transaction is allowed to run forever.
- Transaction self-monitoring: upon its execution, a transaction monitors itself to ensure that they don't exceed its allocated transaction time. This is a hard timeout since it happens no matter if another connection is trying to acquire the lock.
We should increase the timeout to let transactions run longer if there is no contention. We will still need to limit transactions so as not to prevent checkpoints, but we can also let them run for as long as a stream is open if the connection is uncontended.
-
Kick the watchdog*
In the CLI scenario, keeping a stream open for an extended period may be desirable. We can allow that by having the client periodically kick the stream watchdog. This may require exposing a
Noop
operation in the hrana protocol, whose sole purpose is to keep the stream open.
Ah, I was sure we already implement a soft limit for transactions, to let them last longer if there's no contention. So that's something we should definitely add.
And as for checkpoints, we can make them play the same stealing game - transactions only have a soft timeout, but once a checkpoint is about to be triggered, it tries to steal the lock, and potentially roll back a transaction that already passed the soft timeout.
I think this is a good proposal and definitely a step into the right direction.
One caveat is that closing the sqlite connection after every request will increase the latency of the next request on the same connection. This can be solved though by not releasing the connection eagerly and only evict it from connections cache when we're short on resources.
I think the cost of re-opening a connection is completely absorbed by network latencies, but we'd need to measure that. There are other levers to improve connection creation latency on the long run as well
Issue with lazily closing is that if we return a baton, then it is assumed that it can be used withing the limits. Long lasting connections should be the exception, to reduces contention on the concurrency semaphore
Transactions are given a (configurable, default to 5s) timeslot to complete
Apologies for the sidetrack but how can users configure this timeout duration on turso or in libsql? We currently have a use-case where we're trying to write large amounts of data in a single transaction and it's constantly timing out.
hey @billythedummy, currently this option is not exposed in turso. But there is a workaround. Say you have a main table CREATE TABLE emails (addr)
to which you want to bulk import transactionally. What you can do is create a CREATE TABLE emails_tmp (addr)
, to which you import in batch. Once all the data is loaded in the temp table, you can do INSERT INTO emails SELECT * FROM emails_tmp
. That will copy all the data over to the main table, in a single transaction.