cpp11
cpp11 copied to clipboard
cpp11 `unwind_protect()` + ALTREP conflicts
CC @jennybc and @sbearrows
This is a long post, and is intended to outline a rather complex problem in a way that we can come back to in the future. The short story is this:
If you are writing ALTREP methods, such as an _Elt
method, then you currently have to be extremely careful about using any C++ or cpp11 inside those methods. This is because your methods can be called directly by base R's C code, which means it can be called without being wrapped in a try/catch block, unlike the standard methods of calling cpp11 code, which goes through .Call
and is automatically wrapped in a try/catch by BEGIN_CPP11
and END_CPP11
. In particular, if your ALTREP method calls something that triggers unwind_protect()
and a longjmp()
occurs inside the code that unwind_protect()
calls, then R will crash because unwind_protect()
will throw
an unwind_exception
that can't be caught. A practical example of this is calling cpp11::stop()
inside your ALTREP method.
The problem
Ok, now for a more detailed explanation. First off, you can reproduce what I am going to discuss yourself using this minimal repo https://github.com/DavisVaughan/cpp11altrepbug
pak::pak("DavisVaughan/cpp11altrepbug")
Then call cpp11altrepbug::works()
and cpp11altrepbug::fails()
. The fails()
call should nuke your R session.
I recommend calling fails()
from a terminal R session (not in RStudio), which gives you this useful error message:
> x <- cpp11altrepbug::fails()
> x[1]
Error: oh no
libc++abi: terminating with uncaught exception of type cpp11::unwind_exception: std::exception
Abort trap: 6
Here is what is happening:
There is an ALTREP type named fails_t
. It has an _Elt
method that simply calls cpp11::stop()
. The following chain of events causes that combination to nuke R.
- The
[
function is called whenx[1]
is run - R's internal
[
C code sees thatx
is an ALTREP object. It calls out tofails_Elt()
and runs that. Note that this is NOT our typical cpp11 use case. We have not gone through.Call()
here to get to our cpp11 code, so there is no try/catch set up for us. -
fails_Elt()
callscpp11::stop()
-
cpp11::stop()
is set up assafe[Rf_errorcall]()
. Thesafe
bit forcesunwind_protect()
to be called, andunwind_protect()
will be put in charge of callingRf_errorcall()
- Before actually calling
Rf_errorcall()
,unwind_protect()
sets up a place to jump back to if alongjmp()
is triggered whenRf_errorcall()
is run. It does this withsetjmp()
, which is the place to jump back to. When a jump happens, we return to the place wheresetjmp()
was called, it returnstrue
, causing that if branch to run which throws an unwind exception. Cpp11 makes the assumption thatunwind_protect()
will only be called from cpp11 code that has gone through the "normal" path of execution - i.e. through.Call()
, so thatBEGIN_CPP11
andEND_CPP11
are set up to catch that unwind exception.- The
setjmp()
and throw location: https://github.com/r-lib/cpp11/blob/322e5ee67346926993cc6f464152802118250b78/inst/include/cpp11/protect.hpp#L97-L100 - The definitions of
BEGIN_CPP11
andEND_CPP11
: https://github.com/r-lib/cpp11/blob/322e5ee67346926993cc6f464152802118250b78/inst/include/cpp11/declarations.hpp#L32-L52
- The
- When
Rf_errorcall()
is eventually called, that triggers alongjmp()
, so we jump back to thesetjmp()
location and dothrow unwind_exception(token)
- BUT this happened outside of a try/catch block because of how it was called through base R's C code. The uncaught exception causes R to crash.
Potential solution
The basic idea of the solution is that somehow we need to ensure that all ALTREP methods that we write in C++ with cpp11 are wrapped in a try/catch.
After some thinking, we think that this problem of "someone else" calling our cpp11 code from C can really only occur through ALTREP. That is likely the only place where base R lets us "hook into" their C code. So it is ok if our solution is very specific to ALTREP.
The fix we have come up with is to suggest that cpp11 should provide wrappers around R's ALTREP registration methods, like R_set_altreal_Elt_method(class, fun)
, that "wrap" the fun
with something like:
double wrapper_Elt(SEXP x, R_xlen_t i) {
BEGIN_CPP11
return fun(x, i);
END_CPP11
}
You'd then register the ALTREP method with something like cpp11::set_altreal_Elt_method(class, fun)
.
For that to work, END_CPP11
would have to be tweaked because it currently returns R_NilValue
unconditionally, which won't work here since the return type is double
. So possibly END_CPP11
shouldn't handle the return
part, so we can do return 0;
manually at the end of this wrapper (which should never be called).
Backstory
First found in vroom, which makes great use of ALTREP. This crash could be triggered by catching the warning thrown by warn_for_errors()
and promoting it to an error, triggering the longjmp()
from inside unwind_protect()
while we aren't in a try/catch. To make the crash occur, you also have to call rlang::warn()
inside warn_for_errors ()
through the cpp11::package()
machinery to make sure unwind_protect()
was used (rather than what it currently does, which is to only use base R machinery to avoid usage of unwind_protect()
).
Discussed here: https://github.com/tidyverse/vroom/pull/441#discussion_r883175764
Which links to this original commit, which mentions that we don't want to use unwind protect here: https://github.com/tidyverse/vroom/commit/984a3e5e37e124feacfec3d184dbeb02eb1145c4
Partly related to https://github.com/r-lib/cpp11/issues/219
But note that calling cpp11::stop()
is not the only way this can occur.
The original vroom issue was calling cpp11::package("rlang")["warn"]
and then on the R side we were promoting that to an rlang::abort()
by catching the class warning with withCallingHandlers()
, so just fixing https://github.com/r-lib/cpp11/issues/219 wouldn't be enough.
Alternatively, you could also just make sure that everything in the frame is trivially destructible and does not throw exceptions? If those two conditions are met, you can just use the R API directly (although it sounds like vroom's methods are doing a few more complex things).
In Arrow we have the advantage that Arrow C++ does not use exceptions for error handling, so perhaps this is only a reasonable option for us 🙂 . (Or maybe my approach in https://github.com/apache/arrow/pull/14271 is very very wrong!)
cpp11 is the one throwing the exception though?
Yes...part of what I did in that PR is to not use cpp11 in ALTREP methods. I didn't find it all that hard to avoid but perhaps that is unique to Arrow's ALTREP classes.