materialize
materialize copied to clipboard
persist: send maelstrom an abort if spawned task panics
Our Malestrom service does all of its work in spawned tasks, and task panics don't propagate upwards to kill the runtime/process. This means that currently Maelstrom will happily run a full test and report no consistency violations at the end of a test run, even if the tasks were repeatedly panicking.
With this PR, a panic will send Maelstrom an ABORT
error with which message triggered it, and then terminate the whole node, ensuring the run fails.
As an example, I tried replacing our unexpected missing blob
retry loop with a panic, and a test run will show:
WARN [2022-08-09 11:07:53,711] jepsen test runner - jepsen.core Test crashed!
clojure.lang.ExceptionInfo: Node n0 crashed with exit status 134. Before crashing, it wrote to STDOUT:
...
thread 'tokio-runtime-worker' panicked at 'unexpected missing blob', /Users/phemberger/workspace/materialize/src/persist-client/src/read.rs:849:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2022-08-09T15:07:07.629649Z INFO persistcli::maelstrom::node: res: {"src":"n0","dest":"c3","body":{"type":"error","msg_id":2380,"in_reply_to":1185,"code":14,"text":"Some(\"unexpected missing blob\")"}}
Motivation
- This PR refactors existing code.
Tips for reviewer
Checklist
-
[x] This PR has adequate test coverage / QA involvement has been duly considered.
-
[x] This PR evolves an existing
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way) and therefore is tagged with aT-protobuf
label. -
[x] This PR includes the following user-facing behavior changes:
That seems much simpler (I think I tried something similar but put the abort_on_panic in the wrong place), we can always revive this change if we do need the specificity of which message panicked