pybind11 icon indicating copy to clipboard operation
pybind11 copied to clipboard

Demo: `return_value_policy::copy` ignored

Open rwgk opened this issue 2 years ago • 4 comments

Description

Demo: return_value_policy::copy ignored after a Python object was created already with return_value_policy::reference_internal. Back-ported from pywrapcc test_class_sh_property_stl.cpp,py

Suggested changelog entry:


rwgk avatar Mar 24 '23 16:03 rwgk

Hi @wjakob, what do you think about the existing behavior? It's not very likely to be a problem in real life, but if it bites, it could be very surprising and extremely difficult to debug, especially in a large system.

See test_cpy_after_ref in test_property_stl.py. To see how that can bite in real life, swap these lines in test_persistent_holder:

    c0 = h.vec_fld_hld_cpy[0]  # Must be first. See test_cpy_after_ref().
    r0 = h.vec_fld_hld_ref[0]  # Must be second.

With those lines swapped the test fails:

Running tests in directory "/usr/local/google/home/rwgk/forked/pybind11/tests":
=========================================================== test session starts ============================================================
platform linux -- Python 3.10.9, pytest-7.1.2, pluggy-1.0.0
C++ Info: Debian Clang 14.0.6 C++17 __pybind11_internals_v4_clang_libstdcpp_cxxabi1002__ PYBIND11_SIMPLE_GIL_MANAGEMENT=False
rootdir: /usr/local/google/home/rwgk/forked/pybind11/tests, configfile: pytest.ini
plugins: xdist-3.0.2, flakefinder-1.1.0
collected 3 items

test_property_stl.py .F.                                                                                                             [100%]

================================================================= FAILURES =================================================================
__________________________________________________________ test_persistent_holder __________________________________________________________

    def test_persistent_holder():
        h = m.VectorFieldHolder()
        r0 = h.vec_fld_hld_ref[0]  # Must be second.
        c0 = h.vec_fld_hld_cpy[0]  # Must be first. See test_cpy_after_ref().
        assert c0.fld.wrapped_int == 300
        assert r0.fld.wrapped_int == 300
        h.reset_at(0, 400)
>       assert c0.fld.wrapped_int == 300
E       assert 400 == 300
E        +  where 400 = <pybind11_tests.property_stl.Field object at 0x7f0e8f7ccaf0>.wrapped_int
E        +    where <pybind11_tests.property_stl.Field object at 0x7f0e8f7ccaf0> = <pybind11_tests.property_stl.FieldHolder object at 0x7f0dca754ff0>.fld

c0         = <pybind11_tests.property_stl.FieldHolder object at 0x7f0dca754ff0>
h          = <pybind11_tests.property_stl.VectorFieldHolder object at 0x7f0dca754f70>
r0         = <pybind11_tests.property_stl.FieldHolder object at 0x7f0dca754ff0>

test_property_stl.py:25: AssertionError
========================================================= short test summary info ==========================================================
FAILED test_property_stl.py::test_persistent_holder - assert 400 == 300
======================================================= 1 failed, 2 passed in 0.12s ========================================================

rwgk avatar Mar 24 '23 17:03 rwgk

I think (but have not verified) that the existing behavior goes back to this

https://github.com/pybind/pybind11/blob/5bbcba548a27baf8d6bfb7993e814da24b432dbf/include/pybind11/detail/type_caster_base.h#L524-L526

pre-empting this

https://github.com/pybind/pybind11/blob/5bbcba548a27baf8d6bfb7993e814da24b432dbf/include/pybind11/detail/type_caster_base.h#L546-L562

rwgk avatar Mar 24 '23 17:03 rwgk

Hi @rwgk — I agree, this is super-dangerous and something I did not think of when coding up the original logic a long time ago. I introduced a change here in nanobind by always prioritizing return_value_policy::copy and return_value_policy::move when the instance is already registered with the binding tool. They return a new object in that case. Can't see any major reasons against pybind11 doing the same thing besides existing software potentially having come to rely on the current bug-prone behavior.

This goes deeper by the way: what about the other return value policies, when returning an object that is already registered with the binding tool? All of them are currently ignored.

  • return_value_policy::reference: ignored, and that seems just fine.

  • return_value_policy::take_ownership: no ownership will be taken. But should it? Perhaps the previous version of the object was returned via return_value_policy::reference.

  • return_value_policy::reference_internal: This entangles the lifetime of the parent and child. If the instance already exists, then that doesn't happen. Might be problematic.

I don't do anything special for these other cases in nanobind, because I wasn't 100% sure what to do about them. It would be be good to decide on a way of resolving all of these situations, and then implementing & documenting them consistently in both binding tools.

wjakob avatar Oct 27 '23 06:10 wjakob

Thanks Wenzel! That's very useful background information.

I don't do anything special for these other cases in nanobind, because I wasn't 100% sure what to do about them.

What I usually do in this situation is experimenting: run global testing with different ideas. Reviewing the results usually gives me a much more certainty than going with just thinking and guessing.

It might take me a while to get back here. The problem didn't come up repeatedly, and we're down to a very tiny fraction of failing tests, therefore I came to look at it as a fringe case. But it would definitely be good to get rid of this trap.

If you or anyone wants to try things out (changes in pybind11), please let me know. I can offer running global tests for experimentation. It's not a lot of work for me (only for the machines).

rwgk avatar Oct 27 '23 18:10 rwgk