Generate proper stacktrace when calling reraise in Elixir
👋 @bettio @pguyot I need your help on this one, as I'm not sure if it doesn't break anything, and it relies on at least one hack, described in the comments. Also, I'm not sure how to include the reproduction (snippet below) in the tests - I can translate it to Erlang, but it requires defining several modules and each test is currently a single module AFAIK.
The problem occurs when running the Elixir compiler and can be reproduced in the following way:
defmodule Raiser do
def raise_error() do
raise "foo"
end
end
defmodule IntermediateCaller do
def call_raiser() do
Raiser.raise_error()
:ok
end
end
defmodule Reraiser do
def reraise_error() do
try do
IntermediateCaller.call_raiser()
rescue
e -> reraise e, __STACKTRACE__
end
:ok
end
end
try do
Reraiser.reraise_error()
rescue
e -> {e, __STACKTRACE__}
end
Currently, the stacktrace returned only includes Reraiser.reraise_error/0. After the fixes in this PR, it also includes IntermediateCaller.call_raiser/0 and Raiser.raise_error/0, as it should.
I found out that the reason is that nif_erlang_raise ignores its third argument.
The problem only occurs when the functions are in separate modules - when they're in one module, everything works as expected, so I suppose the compiler generates op_raise instead of nif_erlang_raise in that case (?).
These changes are made under both the "Apache 2.0" and the "GNU Lesser General Public License 2.1 or later" license terms (dual license).
SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later
You definitely can have a test with several modules. We already do. See catch_from_other_module.erl and raise_badmatch.erl.
@pguyot Ok, I added the test. As it turns out, for reproduction it's necessary to wrap erlang:raise with erlang:error, as Elixir's reraise does, or basically do some calls before erlang:raise
Let's try to summarize everything a little bit:
- I think this PR is a workaround for a simple reason: as far as I understand, registers are not meant to hold error information while returning from the function that raised the error. So it is likely building on something quite wrong.
This is likely a wrong pattern:
70 #define RAISE(a, b) \
71 ctx->x[0] = (a); \
72 ctx->x[1] = (b); \
73 return term_invalid_term();
We should store relevant information as part of Context, so they don't get clobbered by mistake.
I think this kind of workaround, that @mat-hek did, explains pretty well the issue (I think) we have:
static bool is_exception_class(term t)
{
return t == ERROR_ATOM || t == LOWERCASE_EXIT_ATOM || t == THROW_ATOM;
}
[...]
if (stacktrace_is_built(stacktrace)) {
// FIXME: This is a temporary solution as in some niche cases of reraise the x_regs[0] is
// overwritten and it does not represent exception class
if (!is_exception_class(x_regs[0])) {
x_regs[0] = ERROR_ATOM;
}
@mat-hek by the way, do you have any code sample where this corner case is visible and reproducible?
- Any drawback of adding additional fields to Context?
I think we should be able to manage this adding to Context something like:
context_flags_t flags; // error, exit, throw, ... (space for more flags)
term reason;
term stacktrace;
In the future we might even add:
term value;
This can be set from VALIDATE_VALUE macro for improved debugging.
Also avoiding overwriting x[0]...x[2] registers, allows us to have improved stacktraces, than can contain function arguments and so on.
- What we should do now?
I suggest finding if merging this PR (as a temporary workaround) is doable. Can we update this PR to JIT changes with a reasonable effort? If so, Popcorn can benefit this, since they can use an updated AtomVM. Otherwise it would be hard for them to stay on par with upstream (until we find some kind of better error management).
Also, I really think we should instead revamp error handling as soon as we have less moving parts on JIT side.
@pguyot your opinion is important here
Comments?
@mat-hek by the way, do you have any code sample where this corner case is visible and reproducible?
@bettio To be clear, this problem only occurs if you bring the changes from this PR without the fix. The problem is reproducible with popcorn, see this PR. It would require running popcorn master with fissionvm without the fix. I can set it up for you if needed.
Also, this PR doesn't fix any critical bugs - it just improves generated stacktraces, thus we can run popcorn without it. It'll be harder to debug though, so it may delay abandoning fissionvm indeed.
@bettio BTW, if you decide on how error handling should be eventually implemented and provide some guidance, we can try to contribute it ;)