Change the internal behavior of expect/snapshot tests
Background
I am currently doing a major rewrite of moon at moonbitlang/moon#802. During rewrite, a behavior that consistently annoyed me is how expect tests (inspect and friends) work under the hood when updating snapshots.
This might be blocker to achieving feature-parity with legacy code for moon test -- I am against implementing the legacy behavior again in the new code, which is supposed to be actually maintainable.
(And no, I'm not talking about all those heuristics needed to find the patch point. That's #2601.)
The issue
When an expect test fails, it aborts the current control flow by throwing an InspectError. The error bubbles up the call stack, ultimately reaching the test block containing it, and gets captured.
https://github.com/moonbitlang/core/blob/3c615179f2ee3a60773d674cee7df4ebf56d857b/builtin/console.mbt#L40-L53
This is a very valid control flow when doing regular tests. However, the inspect function does not know whether we're updating snapshot results or not. This means that, upon an expect test failure when we're updating results, the function also unconditionally aborts the control flow to send the "expect failed" message, preventing the tests after it (if any) to execute.
The main consequence of it is that all failed expect tests need at least 2 execution rounds in order to be updated to the expected value and ensure that no other checks fails within it, which costs significant time, compilation effort, and degrades user experiences.
Moreover, since when updating expect tests, an inspection failure is not an error, the test error representing the failure must not be displayed. Instead, the displayed result should be the test result after all expect tests are updated. This means either:
- You give up batch updates, but you need to do the expect updates midway during test result processing, or
- You want batch updates, but you need to hold the result reporting back. until all your updates has finished.
Either way, you severely degrade the user experience because the issue lies in how expect tests are updated.
The solution
This part is inspired by expect test implementations in other languages.
I propose that we add an internal global flag, which we will call it update_expect from here. The flag is true if we're updating expect tests, false otherwise.
When update_expect is false, the behavior is exactly the same as the current one -- bail out on the first expect failure.
When update_expect is true, inspect and similar functions should never throw an error. If the expected value require updating, it reports the to-be-updated value to the test driver (the actual method TBD), and returns normally as if the check has passed.
In this way, all expect tests can be updated at once, and the test result is the same as that if the expected values are already updated (either case the expect tests will pass). All test cases will need to be executed once and only once, and all updates can be applied in a single batch. It would greatly simplify how the relevant test handlers are implemented, and provide a much better and consistent experience to the user.
Remaining questions
- How should we set the flag? Env var, special flags in the test driver, or what else?
- How should we accept the update requests? If needing to be cross-platform and matching existing tools, it should be within delimiters in
stdout.
I'm thinking of have a new build mode specifically for test, where we may use extra output channel for these error messages and use specific input, e.g. hash seed. To be discussed.
Extra output channel for these error messages
That's even better, because currently all coverage data, test output and friends are all outputted in stdout channel using delimiters (and mixed). If we can separate these entities, or even just separate them from stdout, it's a clear win.
How should we set the flag? Env var, special flags in the test driver, or what else?
We can pass the result via test block parameter. This is similar to the snapshot-to-file approach, which uses t.write and t.writeln to pass the results.
This is also commonly referred to as DPS (Destination Passing Style).
test (t : @test.T) {
t.inspect("hello",content="hello")
}
I second this idea. This way, we can also pass the param in the test driver file, using whatever method currently available to pass the list of tests to run (driver files in JS & WASM, command-line params in native).
The current parameter parsing needs at least a partial rework, though.