pybind11 icon indicating copy to clipboard operation
pybind11 copied to clipboard

[BUG]: Pybind11 crashes on cleanup in GCC 7.4 and 7.5

Open Sleemanmunk opened this issue 3 years ago • 3 comments
trafficstars

Required prerequisites

  • [X] Make sure you've read the documentation. Your issue may be addressed there.
  • [X] Search the issue tracker and Discussions to verify that this hasn't already been reported. +1 or comment there if it has.
  • [X] Consider asking first in the Gitter chat room or in a Discussion.

Problem description

Pybind consistently fails to clean up after itself if I have imported Pytorch.

When shutting down the interpreter, whether manually or via the scoped interpreter leaving scope, Pybind tries to free memory that has already been freed and crashes.

I'm using the latest stable version of pybind and GCC 7.5. The problem does not manifest in GCC 4.8.5.

More information in the example code project

Reproducible example code

https://github.com/Sleemanmunk/pybind_crash

Sleemanmunk avatar Mar 18 '22 14:03 Sleemanmunk

Here's the diagnosis I posted on gitter:

A look at that version of pytorch indicates that they build using pybind11 2.6.2? https://github.com/pytorch/pytorch/tree/v1.10.2/third_party ... I know pybind11 has some guards against different versions breaking each other, but I'm not 100% sure how it works. Looks like that isn't working, so probably a pybind11 bug.

To fix your issue, try building with the same version of pybind11 that pytorch was built with.

My diagnosis is based on your stack trace, which shows destruction of pybind11's internals crashing -- the internals periodically change. Might be a weird thing happening because your main application has pybind11 built in, instead of what I imagine is the more tested/typical case -- multiple libraries loaded by python, with different libraries having potentially different versions of pybind11.

I don't have time to actually look into this myself.

virtuald avatar Mar 21 '22 03:03 virtuald

Thanks for looking into this. I tried with version 2.6.2 with the same result image

Sleemanmunk avatar Mar 21 '22 17:03 Sleemanmunk

That's unfortunate. I don't have any specific experience using embedded interpreters, so I don't think I can offer any more advice here.

I hadn't noticed that you said it worked on GCC 4.8. Why not just use that then?

It might make sense to examine the pytorch compiled library and see which version of GCC it was compiled with? There's probably a string in there with the version number. I know that mixing GCC versions can sometimes lead to unexpected results -- in particular, there were some string related ABI breaks that happened in GCC 5.

virtuald avatar Mar 21 '22 19:03 virtuald