python tests fail because failed GIL interaction
Tested with 2.3.0 and 2.3.1 on gentoo using python-3.11 and python-3.12.
Running ctest -R test_python fails with:
test_apply (CPUProcessorTest.CPUProcessorTest.test_apply) ... pybind11::handle::inc_ref() is being called while the GIL is either not held or invalid. Please see https://pybind11.readthedocs.io/en/stable/advanced/misc.html#common-sources-of-global-interpreter-lock-errors for debugging advice.
If you are convinced there is no bug in your code, you can #define PYBIND11_NO_ASSERT_GIL_HELD_INCREF_DECREFto disable this check. In that case you have to ensure this #define is consistently used for all translation units linked into a given pybind11 extension, otherwise there will be ODR violations.The failing pybind11::handle::inc_ref() call was triggered on a numpy.ndarray object.
ERROR
test_apply_src_dst (CPUProcessorTest.CPUProcessorTest.test_apply_src_dst) ... pybind11::handle::inc_ref() is being called while the GIL is either not held or invalid. Please see https://pybind11.readthedocs.io/en/stable/advanced/misc.html#common-sources-of-global-interpreter-lock-errors for debugging advice.
If you are convinced there is no bug in your code, you can #define PYBIND11_NO_ASSERT_GIL_HELD_INCREF_DECREFto disable this check. In that case you have to ensure this #define is consistently used for all translation units linked into a given pybind11 extension, otherwise there will be ODR violations.The failing pybind11::handle::inc_ref() call was triggered on a numpy.ndarray object.
ERROR
======================================================================
ERROR: test_apply (CPUProcessorTest.CPUProcessorTest.test_apply)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/var/tmp/paludis/media-libs-opencolorio-2.3.1/work/OpenColorIO-2.3.1/tests/python/CPUProcessorTest.py", line 302, in test_apply
image = OCIO.PackedImageDesc(arr, 7, 3, 3)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: pybind11::handle::inc_ref() PyGILState_Check() failure.
======================================================================
ERROR: test_apply_src_dst (CPUProcessorTest.CPUProcessorTest.test_apply_src_dst)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/var/tmp/paludis/media-libs-opencolorio-2.3.1/work/OpenColorIO-2.3.1/tests/python/CPUProcessorTest.py", line 331, in test_apply_src_dst
src_image = OCIO.PackedImageDesc(src_arr, 7, 3, 3)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: pybind11::handle::inc_ref() PyGILState_Check() failure.
----------------------------------------------------------------------
I would like to spend some time looking into this. I want to say that there isn't an issue and we could use PYBIND11_NO_ASSERT_GIL_HELD_INCREF_DECREF to disable this check. But that may not be ideal either. Fortunately the VFX Reference platform only goes up to python 3.11 for now so just to be sure, this is not a problem for that version of python, correct?
Tested with 2.3.0 and 2.3.1 on gentoo using python-3.11 and python-3.12.
Python-3.11 is affected as well.
It would be interesting to find out why this error isn't triggered in our CI, we test on both Python 3.11 and 3.12 when building the wheels. Maybe Python is compiled in a different way on Gentoo? Which version of Pybind11 are you using? OCIO is using a pretty old version I think at this point and newer versions are untested.
Versions used are:
- python-3.11.7
- python-3.12.1_p1
- pybind11-2.11.1
- gcc-12.3.1_p20240112
I'll update to python-3.11.8 and python-3.12.2 later today and retest, doubt it changes anything.
I can write a Dockerfile if that helps you reproduce it.
Thanks for the information @negril, we are using Pybind11 2.9.2 so it could be interesting on our side to try with 2.11.1.
Strangely enough, after explicitly using Pybind11 2.11.1 and Python 3.12 on macOS, Debug build, I can't get any of these assertions to trigger, even after adding obviously wrong acquire / release sequences in the code. Otherwise I can't see anything obviously wrong we are doing in the Pybind11 code for the time being.
Maybe a Docker image could help yeah, @meimchu did you manage to reproduce the issue on your side?
Indeed I agree with maybe having a docker image could be helpful with debugging. I have Python 3.9.6 on my MacOS unfortunately though I did try to build it with the newer Pybind11 2.11.1 and did not encounter that issue.
I have used pyenv to get both python 3.11.7 and python 3.12.1 on my MacOS. I also changed the build setting to use pybind 2.11.1 to build. I ran both ctest -R test_python and manually test with both versions of python on the OpenColorIOTestSuite.py. None of them appear to have return the same issues as all tests passed. Feels like a potential variable here might be the GCC version then? At least in the case of VFX reference platform, they’ve been sticking with GCC 11.2.1 for 2023/2024 for Linux distros. This mystery continues, it seems!
I did hit the same issue while working on CI updates, could you try again on the main branch?