materialize icon indicating copy to clipboard operation
materialize copied to clipboard

persist: send maelstrom an abort if spawned task panics

Open pH14 opened this issue 1 year ago • 0 comments

Our Malestrom service does all of its work in spawned tasks, and task panics don't propagate upwards to kill the runtime/process. This means that currently Maelstrom will happily run a full test and report no consistency violations at the end of a test run, even if the tasks were repeatedly panicking.

With this PR, a panic will send Maelstrom an ABORT error with which message triggered it, and then terminate the whole node, ensuring the run fails.

As an example, I tried replacing our unexpected missing blob retry loop with a panic, and a test run will show:

WARN [2022-08-09 11:07:53,711] jepsen test runner - jepsen.core Test crashed!
clojure.lang.ExceptionInfo: Node n0 crashed with exit status 134. Before crashing, it wrote to STDOUT:
...
thread 'tokio-runtime-worker' panicked at 'unexpected missing blob', /Users/phemberger/workspace/materialize/src/persist-client/src/read.rs:849:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2022-08-09T15:07:07.629649Z  INFO persistcli::maelstrom::node: res: {"src":"n0","dest":"c3","body":{"type":"error","msg_id":2380,"in_reply_to":1185,"code":14,"text":"Some(\"unexpected missing blob\")"}}

Motivation

  • This PR refactors existing code.

Tips for reviewer

Checklist

pH14 avatar Aug 09 '22 15:08 pH14

That seems much simpler (I think I tried something similar but put the abort_on_panic in the wrong place), we can always revive this change if we do need the specificity of which message panicked

pH14 avatar Aug 10 '22 21:08 pH14