cpp11 icon indicating copy to clipboard operation
cpp11 copied to clipboard

cpp11 `unwind_protect()` + ALTREP conflicts

Open DavisVaughan opened this issue 2 years ago • 4 comments

CC @jennybc and @sbearrows

This is a long post, and is intended to outline a rather complex problem in a way that we can come back to in the future. The short story is this:

If you are writing ALTREP methods, such as an _Elt method, then you currently have to be extremely careful about using any C++ or cpp11 inside those methods. This is because your methods can be called directly by base R's C code, which means it can be called without being wrapped in a try/catch block, unlike the standard methods of calling cpp11 code, which goes through .Call and is automatically wrapped in a try/catch by BEGIN_CPP11 and END_CPP11. In particular, if your ALTREP method calls something that triggers unwind_protect() and a longjmp() occurs inside the code that unwind_protect() calls, then R will crash because unwind_protect() will throw an unwind_exception that can't be caught. A practical example of this is calling cpp11::stop() inside your ALTREP method.

The problem

Ok, now for a more detailed explanation. First off, you can reproduce what I am going to discuss yourself using this minimal repo https://github.com/DavisVaughan/cpp11altrepbug

pak::pak("DavisVaughan/cpp11altrepbug")

Then call cpp11altrepbug::works() and cpp11altrepbug::fails(). The fails() call should nuke your R session.

I recommend calling fails() from a terminal R session (not in RStudio), which gives you this useful error message:

> x <- cpp11altrepbug::fails()
> x[1]
Error: oh no
libc++abi: terminating with uncaught exception of type cpp11::unwind_exception: std::exception
Abort trap: 6

Here is what is happening:

There is an ALTREP type named fails_t. It has an _Elt method that simply calls cpp11::stop(). The following chain of events causes that combination to nuke R.

  • The [ function is called when x[1] is run
  • R's internal [ C code sees that x is an ALTREP object. It calls out to fails_Elt() and runs that. Note that this is NOT our typical cpp11 use case. We have not gone through .Call() here to get to our cpp11 code, so there is no try/catch set up for us.
  • fails_Elt() calls cpp11::stop()
  • cpp11::stop() is set up as safe[Rf_errorcall](). The safe bit forces unwind_protect() to be called, and unwind_protect() will be put in charge of calling Rf_errorcall()
  • Before actually calling Rf_errorcall(), unwind_protect() sets up a place to jump back to if a longjmp() is triggered when Rf_errorcall() is run. It does this with setjmp(), which is the place to jump back to. When a jump happens, we return to the place where setjmp() was called, it returns true, causing that if branch to run which throws an unwind exception. Cpp11 makes the assumption that unwind_protect() will only be called from cpp11 code that has gone through the "normal" path of execution - i.e. through .Call(), so that BEGIN_CPP11 and END_CPP11 are set up to catch that unwind exception.
    • The setjmp() and throw location: https://github.com/r-lib/cpp11/blob/322e5ee67346926993cc6f464152802118250b78/inst/include/cpp11/protect.hpp#L97-L100
    • The definitions of BEGIN_CPP11 and END_CPP11: https://github.com/r-lib/cpp11/blob/322e5ee67346926993cc6f464152802118250b78/inst/include/cpp11/declarations.hpp#L32-L52
  • When Rf_errorcall() is eventually called, that triggers a longjmp(), so we jump back to the setjmp() location and do throw unwind_exception(token)
  • BUT this happened outside of a try/catch block because of how it was called through base R's C code. The uncaught exception causes R to crash.

Potential solution

The basic idea of the solution is that somehow we need to ensure that all ALTREP methods that we write in C++ with cpp11 are wrapped in a try/catch.

After some thinking, we think that this problem of "someone else" calling our cpp11 code from C can really only occur through ALTREP. That is likely the only place where base R lets us "hook into" their C code. So it is ok if our solution is very specific to ALTREP.

The fix we have come up with is to suggest that cpp11 should provide wrappers around R's ALTREP registration methods, like R_set_altreal_Elt_method(class, fun), that "wrap" the fun with something like:

double wrapper_Elt(SEXP x, R_xlen_t i) {
  BEGIN_CPP11
    return fun(x, i);
  END_CPP11 
}

You'd then register the ALTREP method with something like cpp11::set_altreal_Elt_method(class, fun).

For that to work, END_CPP11 would have to be tweaked because it currently returns R_NilValue unconditionally, which won't work here since the return type is double. So possibly END_CPP11 shouldn't handle the return part, so we can do return 0; manually at the end of this wrapper (which should never be called).

Backstory

First found in vroom, which makes great use of ALTREP. This crash could be triggered by catching the warning thrown by warn_for_errors() and promoting it to an error, triggering the longjmp() from inside unwind_protect() while we aren't in a try/catch. To make the crash occur, you also have to call rlang::warn() inside warn_for_errors () through the cpp11::package() machinery to make sure unwind_protect() was used (rather than what it currently does, which is to only use base R machinery to avoid usage of unwind_protect()).

Discussed here: https://github.com/tidyverse/vroom/pull/441#discussion_r883175764

Which links to this original commit, which mentions that we don't want to use unwind protect here: https://github.com/tidyverse/vroom/commit/984a3e5e37e124feacfec3d184dbeb02eb1145c4

DavisVaughan avatar Jun 03 '22 19:06 DavisVaughan

Partly related to https://github.com/r-lib/cpp11/issues/219

But note that calling cpp11::stop() is not the only way this can occur.

The original vroom issue was calling cpp11::package("rlang")["warn"] and then on the R side we were promoting that to an rlang::abort() by catching the class warning with withCallingHandlers(), so just fixing https://github.com/r-lib/cpp11/issues/219 wouldn't be enough.

DavisVaughan avatar Jun 03 '22 19:06 DavisVaughan

Alternatively, you could also just make sure that everything in the frame is trivially destructible and does not throw exceptions? If those two conditions are met, you can just use the R API directly (although it sounds like vroom's methods are doing a few more complex things).

In Arrow we have the advantage that Arrow C++ does not use exceptions for error handling, so perhaps this is only a reasonable option for us 🙂 . (Or maybe my approach in https://github.com/apache/arrow/pull/14271 is very very wrong!)

paleolimbot avatar Oct 06 '22 13:10 paleolimbot

cpp11 is the one throwing the exception though?

DavisVaughan avatar Oct 06 '22 14:10 DavisVaughan

Yes...part of what I did in that PR is to not use cpp11 in ALTREP methods. I didn't find it all that hard to avoid but perhaps that is unique to Arrow's ALTREP classes.

paleolimbot avatar Oct 06 '22 14:10 paleolimbot