Pavel Tcholakov
Pavel Tcholakov
The full logs are attached as an artifact to the run, but copying them here for safekeeping. [jepsen-store.zip](https://github.com/user-attachments/files/19551926/jepsen-store.zip) This makes the raw logs a lot easier to read: `sed -f...
Summary: - After the partition healed, a handful of client calls completed, but then _all_ calls started timing out after 2s. However, `n2` which was the active partition processor for...
Same root cause with the failure in https://github.com/restatedev/jepsen/actions/runs/14205571402 - the timeouts were legitimate, caused by a partition processor slow-down followed by a leader change immediately after the partition healed. It...
Another recurrence: https://github.com/restatedev/jepsen/actions/runs/14287188130 I've increased the heal timeout to 60s to see if things actually converge. In the meantime, the following changes do make a difference to the convergence: -...
Confident that with the latest gossip changes this is now behind us. I've fixed the false negatives from the nightly tests and will continue to monitor the runs, but here's...
Makes sense! This would be nice in conjunction with the ability to retain the "last N" snapshots together with a feature to report only the Nth most recent snapshot's LSN...