pybind11 icon indicating copy to clipboard operation
pybind11 copied to clipboard

[BUG]: GCC/Clang exceptions are exported without any ABI change mitigation

Open feltech opened this issue 1 year ago • 1 comments

Required prerequisites

  • [X] Make sure you've read the documentation. Your issue may be addressed there.
  • [X] Search the issue tracker and Discussions to verify that this hasn't already been reported. +1 or comment there if it has.
  • [X] Consider asking first in the Gitter chat room or in a Discussion.

What version (or hash if on master) of pybind11 are you using?

2.9, 2.10 (code seems similar on current master)

Problem description

pybind11 enforces hidden visibility for all symbols, apart from exceptions (including error_already_set), where the PYBIND11_EXPORT_EXCEPTION macro is used to grant them default visibility . This is presumably to allow pybind11-specific exceptions to be caught across DSO boundaries, which is useful.

However, if the DSOs use different versions of pybind11 internally, then e.g. catching error_already_set, even as a std::exception, and calling .what() on it can give garbled text (at best).

We experienced this exact scenario with a library built against pybind11 2.10 and an application built against 2.9. Between these versions, the ABI of error_already_set changed dramatically.

I would suggest an inline namespace (using the PYBIND11_VERSION_MAJOR and PYBIND11_VERSION_MINOR macros to construct a suitable name). This namespace would wrap all exported types (i.e. exceptions), or perhaps just all of pybind11 (which has the advantage that there is then technically no need to enforce hidden visibility for other symbols). E.g. the full symbolic namespace could be pybind11::v2_13 but code could still refer to it as pybind11 (because of the inline).

Reproducible example code

Tricky to put together a code example. I will explain in words what happens in our product.

You need a full example project with two DSOs. In the order they are loaded, the first uses pybind11 2.9 and the second uses 2.10. 

The first DSO calls through to the second, which calls out to a Python function. 

The Python function then `raise`s. 

The first DSO catches the resulting C++ exception as a `std::exception` and calls `.what()` on it to verify the expected exception message. It will be garbled.

You can see on stepping through a GCC 11 build using GDB, the 2.10 class is used to construct the exception, but it gets the 2.9 vtable pointer.

Is this a regression? Put the last known working version here if it is.

Not a regression

feltech avatar Sep 06 '24 14:09 feltech

This looks like a dupe of https://github.com/pybind/pybind11/issues/4105 - on scanning briefly through the comments, PYBIND11_EXPORT_EXCEPTION pops up a lot.

I suspect its (mostly, see below) fixed by https://github.com/pybind/pybind11/pull/4298 - available since v2.10.1. But need to double-check.

Unfortunately that fix won't solve the problem if one of the DSOs involved is stuck on an older pybind11 version.

feltech avatar May 09 '25 14:05 feltech