Non-atomic transactions with intermediate reads are common
Hi,
I've had a sense that electrified tables were behaving differently than expected. This lead to writing a test that shows non-atomic transactions with intermediate reads are common.
The schema:
-- lww append only list register
CREATE TABLE IF NOT EXISTS lww (
k INTEGER PRIMARY KEY,
v TEXT
);
-- electrify table
ALTER TABLE lww ENABLE ELECTRIC;
Read and append transactions are generated:
[[:r 9 [9 10 11 12 18]] [:append 5 3]]
The better-sqlite3 driver is used:
const upsert = txn_conn.prepare(
'INSERT INTO lww (k,v) VALUES (@k,@v) ON CONFLICT (k) DO UPDATE SET v = v || \' \' || @v');
const select = txn_conn.prepare(
'SELECT k,v FROM lww WHERE k = @k');
electricsql/electric:latest Docker images and "electric-sql": "^0.10.1" client are used in a cluster of:
- 5 client nodes
- total of 10 tps
- for 30s
- all transactions are successful
Here's an example of an intermediate read:
- n1 appends 105 and 106 to 10
- n2 only reads 105
- n1,n3,n4,n5 read 105 and 106
- and n2 later reads 105 and 106
{:write-op {:index 465,
:node "n1",
:value [[:append
10
105]
...
[:append
10
106]
...]},
:intermediate-write [:append
10
105],
:final-write [:append
10
106],
:read-op {:index 493,
:node "n2",
:value [...
[:r
10
[1
...
105]]]},
:intermediate-read [:r
10
[1
...
105]],
:reads-of-final ({:index 467,
:node "n1",
:value [[:r
10
[1
...
105
106]]
[:append
10
107]
...
[:r
10
[1
...
105
106
107]]]}
{:index 509,
:node "n4",
:value [[:append
10
127]
[:r
10
[1
...
105
106
107
112
117
118
123
127]]
...
[:append
10
128]]}
{:index 691,
:node "n3",
:value [[:r
10
[1
...
105
106
107
112
117
118
123
127]]]}
{:index 695,
:node "n5",
:value [[:r
10
[1
...
105
106
107
112
117
118
123
127]]]}
{:index 707,
:node "n2",
:value [[:r
10
[1
...
105
106
107
112
117
118
123
127]]]})}
Running the same test on a non-electrified db shows no anomalies.
The test finds a variety of cycles and anomalies. There are cases where the process will not read its writes or writes are lost. Those will be covered in a future test more suited to illustrate that class of anomalies.
After doing some research on SQLite3 and isolation, I did try using two separate distinct db connections, one for ElectricSQL and one for transactions. This did not change the results.
Dumped the pragma values of the connections and didn't see anything of interest? connections-pragma.json
The test framework is Jepsen. A docker environment and GitHub action have been created that reliably reproduce the behavior. The GitHub action is a good place to start, https://github.com/nurturenature/jepsen-causal-consistency/actions
- ElectricSQL: Intermediate-Read
- will fail
- Local-SQLite3: Intermediate-Read
- same test but with a non-electrified local SQLite3 db
- will pass
The test creates two artifacts, a result summary and the full test results.
In the results, look for :G1b anomalies with :reads-of-final values.
These are specifically intermediate reads.
The full results:
have complete transaction, node, sync service, postgres, etc. logs.
The causal directory has further cycle and graph analysis.
Intermediate reads will be graphed as a G-single-item.
The graph for the example given above's intermediate read cycle:
I'm sure there will be questions so please ask. I also hope to make the Jepsen Docker environment as usable as possible so feedback welcome.
Thanks.
Client has been updated:
- from 0.10.1
- to electric-sql@latest.
- no changes in test behavior
The non-electric SQLite3 GitHub action has been renamed and updated to reflect the tests run successfully when not electrified:
- name: Non-electric-SQLite3: Full Test
- 100 tps for 60s
- full checking of validity
@nurturenature Thanks a lot for sharing these results! We'll be taking a closer look at the anomalies.
👋 we've been working the last month on a rebuild of the Electric server over at a temporary repo https://github.com/electric-sql/electric-next/
You can read more about why we made the decision at https://next.electric-sql.com/about
We're really excited about all the new possibilities the new server brings and we hope you'll check it out soon and give us your feedback.
We're now moving the temporary repo back here. As part of that migration we're closing all the old issues and PRs. We really appreciate you taking the time to investigate and report this issue!