pybind11
pybind11 copied to clipboard
[BUG]: 3.0.0rc1 regression: pybind11/cast.h:70:32: error: invalid ‘static_cast’
Required prerequisites
- [x] Make sure you've read the documentation. Your issue may be addressed there.
- [x] Search the issue tracker and Discussions to verify that this hasn't already been reported. +1 or comment there if it has.
- [ ] Consider asking first in the Gitter chat room or in a Discussion.
What version (or hash if on master) of pybind11 are you using?
2.3.0rc1
Problem description
I am trying out the numpy 2.3.0rc1 arm windows support, and accidentally enabled building skia-python against pybind11 3.0.0rc1 too - https://github.com/kyamagu/skia-python/actions/runs/15240052449 -
It is failing with pybind11's cast.h:
src/skia/Canvas.cpp:239:19: required from here
/opt/python/cp310-cp310/lib/python3.10/site-packages/pybind11/include/pybind11/cast.h:70:32: error: invalid ‘static_cast’ from type ‘const SkCanvas::Lattice::RectType* const’ to type ‘pybind11::detail::type_caster_enum_type<SkCanvas::Lattice::RectType>::Underlying’ {aka ‘unsigned char’}
70 | return native_enum(static_cast<Underlying>(src)).release();
| ^~~~~~~~~~
I had a quick look at the upgrade guide and nothing jumps out to me yet.
Reproducible example code
Is this a regression? Put the last known working version here if it is.
2.13.6
What version (or hash if on master) of pybind11 are you using? 2.3.0rc1
Do you mean 3.0.0rc1?
I am trying out the numpy 2.3.0rc1 arm windows support, and accidentally enabled building skia-python against pybind11 3.0.0rc1 too - https://github.com/kyamagu/skia-python/actions/runs/15240052449 -
I'm getting a 404 clicking on the link. Could you please double-check the link?
I had a quick look at the upgrade guide and nothing jumps out to me yet.
Did you see/try this already?
- https://github.com/pybind/pybind11/blob/1c10d5e9b1e0cecf7401af735bb3bc515043760d/docs/upgrade.rst#L103-L118
error: invalid ‘static_cast’ from type ‘const SkCanvas::Lattice::RectType* const’
Hm ... a pointer?
A reproducer would be extremely useful.
Apologies for so many things - yes, it is pybind11 3.0.0rc1 I accidentally enabled when I enabled numpy 2.3.0rc1 . Numpy 2.3.0rc1, or rather, arm64 windows turned out to be quite a change and I had about a dozen changes to get ci to pass, so I deleted the failed ci logs... anyway, the line of code the log referred is
https://github.com/HinTak/skia-python/blob/1547890ce648f3b8a40abe4120acdfea6277b853/src/skia/Canvas.cpp#L239
Where the enum is defined a few lines up:
https://github.com/HinTak/skia-python/blob/1547890ce648f3b8a40abe4120acdfea6277b853/src/skia/Canvas.cpp#L211
Afaic, it is just doing def_readwrite on a py::enum struct member for which the underlying type is "unsigned char" (a very small enum which takes only a handful of values, 256 is enough). The pointer is just the def_readwrite part to write to it. Hope this make sense?
Anyway, binding against pybind11 2.13.x is passing now, so this is definitely a regression.
Sorry I don't have a lot of time to spend on this, could you please
- Try the
type_caster_enum_type_enabledspecialization as documented in the upgrade guide? — I strongly believe that should work. - Reduce your production code into a unit test (and post as a draft PR here)? — This is to help me understand the situation.
Note that the native_enum feature passed testing in a gigantic code base, including many third-party packages (Google, when I was still working there). Therefore your case is likely to be an unusual corner case.
Okay, thanks for the time. I see it looks similar to https://github.com/pybind/pybind11/pull/5555#issue-2905028099 - I am busy with the original stuff for which this issue comes as a side line, for a few more days. I'll come back to this perhaps in a weeks' time with trying the type caster in 1, and will try to do 2 too.
Might this be because it used to convert to the underlying integer type, but now it's expecting a py::native_enum caster to be registered?
Might this be because it used to convert to the underlying integer type, but now it's expecting a
py::native_enumcaster to be registered?
... no, I think. The error in the issue description is a compiler error. py::native_enum types are registered at runtime.
I looked around:
- https://skia.googlesource.com/skia.git
include/core/SkCanvas.h:
struct Lattice {
...
enum RectType : uint8_t {
kDefault = 0, //!< draws SkBitmap into lattice rectangle
kTransparent, //!< skips lattice rectangle by making it transparent
kFixedColor, //!< draws one of fColors into lattice rectangle
};
...
};
That looks totally fine.
But looking at the error message again, I beginning to think what's missing is dealing with pointers to enums correctly. And I'm beginning to be surprised that this didn't surface before. (The cast.h code did pass global testing.)
I think we need to add
template <typename SrcType>
static handle cast(const SrcType *src, return_value_policy, handle parent) {
or similar.
Looking at the skia-python code again, I am not sure it makes sense for it to be def_readwrite - that field is indeed a pointer to enum (with the underlying uint8_t), but it is meant to be an array of enums where the counts are kept nearby in fXCount / fYCount . So the getter and the setter should manipulate all 3 of those values together. That said, I kind of expect pybind11 to be just dereferencing the first value of the array, like it normally does, instead of failing to compile.
Another possible occurrence of this issue: https://github.com/conda-forge/ocp-feedstock/pull/66, rebuilding with pybind11 3 does not work and fails on these kind of errors:
$PREFIX/include/pybind11/cast.h:70:32: error: invalid 'static_cast' from type 'IVtk_MeshType*' to type 'pybind11::detail::type_caster_enum_type<IVtk_MeshType>::Underlying' {aka 'int'}
70 | return native_enum(static_cast<Underlying>(src)).release();
https://github.com/Open-Cascade-SAS/OCCT/blob/06f6a5afeca6b58c390b203f95e69be8c34b72aa/src/Visualization/TKIVtk/IVtk/IVtk_Types.hxx#L78
Given that pybind11 3 is released, is there a known workaround for this?
Sorry this issue slipped my mind.
I believe I started working on the idea I outlined on May 27
template <typename SrcType>
static handle cast(const SrcType *src, return_value_policy, handle parent) {
but somehow I lost that work.
I still believe it's an easy fix. We should probably make a patch release for this.
I updated skia-python to use 3.0.1 and it builds, but the result segfaults on all platforms at the end of pytest. So it seems to be a problem about object destruction order possibly.
I updated skia-python to use 3.0.1 and it builds, but the result segfaults on all platforms at the end of pytest. So it seems to be a problem about object destruction order possibly.
Ideally we need a reproducer.
Minimally a full C++ stack trace from a debug build.
If this is really about object destruction order: my first bet would be on a bug in user code. But you never know until you found the root cause.
This is the ci run https://github.com/kyamagu/skia-python/actions/runs/17166412654 - all of them segfault on exit. The most interesting is the fedora one, where it runs "python -c '... import skia; ... ; print("success3")'" and it prints, then segfault. So it segfaults on exit. I 'll need to try it locally to get a trace.
@rwgk I got round to upgrade pybind11 on my own machine, and also uninstalled the system pybind11 2.x , and build locally. Cannot get a segfault with pytest. So it is a github ci + pybind11 3.0.1 segfault on exit with pytest problem. Unfortunately that means either putting a pybind11 version restriction in (which I am reluctant to do, but does work), or just delay new releases until somebody figures it out here. I am just writing here to see if some related issue filing may turn up.
So it is a github ci + pybind11 3.0.1 segfault on exit with pytest problem.
I'd try something like this:
- Work on .github/actions/*.yml: Remove all jobs except one Ubuntu 24.04 with one Python version. (I'm assuming that's an easy way to avoid torturing the github runners unnecessarily. Skip this step if that's not a concern.)
- Verify that the one job I picked segfaults on exit.
- If it's not already a debug build: add
-g-O0options or similar. - Hope the job still segfaults.
- Download the wheel from the github artifacts.
- Reproduce the segfault with the exact same wheel locally.
- Run the same test in gdb to get a stacktrace.
Unfortunately the artefacts are uploaded after pytest... but there is possibly a simpler way: just adding || /bin/true after pytest to force the artefact generation. This reminds me: my local build is different from ci builds - not only do I using a different compiler (clang instead of gcc), but also have a few extra pybind11 DEBUG defines and flags on. Maybe I should try dropping the defines and flags first.
@rwgk
I get ci to upload artifact before testing (which fails). And using that artifact, I can get the problem locally. That whole segfaults locally with at the very end of "pytest" (after everything seems to go as plan), and also with this command - it segfaults after "Success3" appears:
python -c 'import moderngl; moderngl_context = moderngl.create_standalone_context(backend="egl"); import skia; interface = skia.GrGLInterface.MakeEGL() ; assert isinstance(skia.GrDirectContext.MakeGL(interface), skia.GrContext) ; print("Success3")'
Stack trace of thread 327672:
#0 0x00007f7728c8c2d0 free (libc.so.6 + 0x832d0)
#1 0x00007f76e2068e4c n/a (skia.cpython-313-x86_64-linux-gnu.so + 0x68e4c)
#2 0x00007f7728f4133a meth_dealloc (libpython3.13.so.1.0 + 0x14133a)
#3 0x00007f772902cbae property_dealloc (libpython3.13.so.1.0 + 0x22cbae)
#4 0x00007f7728f3d4b5 dictkeys_decref.constprop.0 (libpython3.13.so.1.0 + 0x13d4b>
#5 0x00007f77290643fa type_clear (libpython3.13.so.1.0 + 0x2643fa)
#6 0x00007f7728f5f905 gc_collect_main (libpython3.13.so.1.0 + 0x15f905)
#7 0x00007f7729047f1a _Py_Finalize.constprop.0 (libpython3.13.so.1.0 + 0x247f1a)
#8 0x00007f7729060f71 Py_RunMain (libpython3.13.so.1.0 + 0x260f71)
#9 0x00007f7729017a4b Py_BytesMain (libpython3.13.so.1.0 + 0x217a4b)
#10 0x00007f7728c0c575 __libc_start_call_main (libc.so.6 + 0x3575)
#11 0x00007f7728c0c628 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x3628)
#12 0x00005650b2b143d5 _start (/usr/bin/python3.13 + 0x3d5)
#0 0x00007fd7f707ce9c __pthread_kill_implementation (libc.so.6 + 0x73e9c)
#1 0x00007fd7f7022f3e raise (libc.so.6 + 0x19f3e)
#2 0x00007fd7f700a6d0 abort (libc.so.6 + 0x16d0)
#3 0x00007fd7f700b6f3 __libc_message_impl.cold (libc.so.6 + 0x26f3)
#4 0x00007fd7f7087035 malloc_printerr (libc.so.6 + 0x7e035)
#5 0x00007fd7f708c3cc free (libc.so.6 + 0x833cc)
#6 0x00007fd7e5668e4c n/a (skia.cpython-313-x86_64-linux-gnu.so + 0x68e4c)
#7 0x00007fd7f734133a meth_dealloc (libpython3.13.so.1.0 + 0x14133a)
#8 0x00007fd7f742cbae property_dealloc (libpython3.13.so.1.0 + 0x22cbae)
#9 0x00007fd7f733d4b5 dictkeys_decref.constprop.0 (libpython3.13.so.1.0 + 0x13d4b>
#10 0x00007fd7f74643fa type_clear (libpython3.13.so.1.0 + 0x2643fa)
#11 0x00007fd7f735f905 gc_collect_main (libpython3.13.so.1.0 + 0x15f905)
#12 0x00007fd7f7447f1a _Py_Finalize.constprop.0 (libpython3.13.so.1.0 + 0x247f1a)
#13 0x00007fd7f7469cb8 Py_Exit (libpython3.13.so.1.0 + 0x269cb8)
#14 0x00007fd7f74690e9 handle_system_exit (libpython3.13.so.1.0 + 0x2690e9)
#15 0x00007fd7f7468e90 _PyErr_PrintEx (libpython3.13.so.1.0 + 0x268e90)
#16 0x00007fd7f725f06d _PyRun_SimpleFileObject.cold (libpython3.13.so.1.0 + 0x5f06>
#17 0x00007fd7f7462d81 _PyRun_AnyFileObject (libpython3.13.so.1.0 + 0x262d81)
#18 0x00007fd7f7461147 Py_RunMain (libpython3.13.so.1.0 + 0x261147)
#19 0x00007fd7f7417a4b Py_BytesMain (libpython3.13.so.1.0 + 0x217a4b)
#20 0x00007fd7f700c575 __libc_start_call_main (libc.so.6 + 0x3575)
#21 0x00007fd7f700c628 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x3628)
#22 0x000055c4adcb33d5 _start (/usr/bin/python3.13 + 0x3d5)
Are these backtrace any good?
The only non-cPython, non-system stack frames seem to be these:
#1 0x00007f76e2068e4c n/a (skia.cpython-313-x86_64-linux-gnu.so + 0x68e4c)
#6 0x00007fd7e5668e4c n/a (skia.cpython-313-x86_64-linux-gnu.so + 0x68e4c)
Is that extension built with pybind11?
Unfortunately we don't even get a function name (let alone a lone number), just n/a.
Wild guess: This is a double-deallocation error.
But there could be many other reasons.
It'd be 10:1 it's a bug in the skia code or Python bindings (IOW I believe the fix will be in skia, not in pybind11).
As the next step, I'd try to reproduce that situation exactly locally, building from sources (not pulling the wheel).
Then I'd try to switch to a debug build (to get the function name and line number), hoping to still get the same segfault. Hopefully the line number will give a clue. If not, I'd drill down to figure out what pointer is the trouble maker.
Sometimes the segfault goes away simply by switching to a debug build, in that case I'd resort to reducing and inserting prints.
Root-causing something like this is often very cumbersome.