pybind11 icon indicating copy to clipboard operation
pybind11 copied to clipboard

[WIP] Towards a solution for custom-type rvalue arguments

Open rhaschke opened this issue 5 years ago • 47 comments

pybind11 doesn't support rvalue function arguments for custom types, i.e. the following function cannot be wrapped:

void consume(CustomType&& object) {
  CustomType sink(std::move(object));
}

This issue is (more or less) unrelated to rvalue holder arguments as requested e.g. in #2040 or #2583. While moving holder arguments essentially transfers ownership from python to C++, which is admittedly difficult, supporting functions like above shouldn't pose a conceptual problem: The original object instance remains existing in python, but only its internals are moved over to C++.

Currently, this use case is not supported, because the cast_op_type provided by type_caster_base<T> doesn't support moving. Replacing it with movable_cast_op_type<T> and providing a corresponding cast operator already solves the problem for my example.

However, this solution makes moving preferred over copying, which isn't desired either. EDIT: A major rework of this PR solves this issue, see https://github.com/pybind/pybind11/pull/2047#issuecomment-724256796 and https://github.com/pybind/pybind11/pull/2047#issuecomment-791916065 for more details.

rhaschke avatar Dec 30 '19 23:12 rhaschke

Keeping the copy-only cast operator for copyable_holder_caster, I was able to solve most unit test issues. However, I had to slightly modify the ConstructorStats, because moving is preferred over copying now. Please check the validity of these changes before merging.

rhaschke avatar Dec 31 '19 03:12 rhaschke

Dear Robert,

just a quick heads-up: I'm super busy with a conference deadline and don't have many free clock cycles in the next month. A quick bit of feedback is that this PR (and the sub-component #2048) seem very reasonable, but I'm scared by #2046 (both in terms of the heavy modifications and all of the things that could possibly go wrong). Moving a holder from Python to C++ and making it disappear in Python-land seems like a really counter-intuitive and even dangerous operation because the language provides no semantics for anything like it. I've also added @jagerman who authored some of the parts you are modifying.

Best, Wenzel

wjakob avatar Jan 01 '20 21:01 wjakob

Dear Wenzel,

thanks for your feedback. I will be busy to the end of the month as well, I just used the holidays to look into pybind11 and migrate a project of mine from boost::python. I can fully understand your concerns regarding #2046. Moving ownership from Python to C++ is somewhat risky. However, I think, if tackled correctly, it can be made safe: The python object instance doesn't disappear, but the value pointer and its holder are invalidated. All that is needed, is to carefully validate them before accessing them. Lacking support for move semantics is a serious shortcoming for a C++ to python wrapper library and it is an actual blocker for my migration process. It would be great if we could jointly come up with a safe and clean solution for this issue. #2046 is only a very preliminary approach, exploring potential solutions. It would be great if we could discuss open issues in a skype conference or the like.

I have to admit that I am new to template meta programming and I don't yet fully understand the architecture of pybind11. Particularly, I have the following questions currently:

  • Why does cast_op<T> choose the target type based on the corresponding caster (make_caster<T>::cast_op_type) instead of the actual function argument type, which should be T? I understand that this allows the caster to steer the casting process. However, it also means that move-semantics is preferred over copy-semantics, as soon as movable_cast_op_type<T> is used instead of cast_op_type<T>. For example, this results in the move constructor being called for ExampleMandA::add1 with this PR's main modification, which is probably not desired. In my point of view, the caster should suppress undesired semantics by not providing corresponding implementations, but the casting process itself should exactly follow the function signature.
  • What is the purpose of registering instances?
  • In general, is there some documentation about the architectural design of pybind11?

Best, Robert

rhaschke avatar Jan 01 '20 23:01 rhaschke

Hi, I have the same problem with creating an C++ object in python, then moving it into some other C++ object, still in python. I don't care so much about what the python object is supposed to mean after it has been moved from, what is important is that the C++ underlying object is fine. What is the status of this PR and #2046?In particular, as asked by @rhaschke, is it something pybind11 is willing to support, or discard as not well-defined? From @rhaschke, it seems that boost does it?

BerengerBerthoul avatar Oct 07 '20 10:10 BerengerBerthoul

It would be great if pybind11 would officially support this scenario which is common when wrapping modern C++ libs. boost::python doesn't support move semantics as well, but I was able to find a workaround using deprecated auto_ptrs. My proposed solution for pybind11 works for me, but I'm - of course - not aware of all the bells and whistles.

rhaschke avatar Oct 07 '20 14:10 rhaschke

The problem is that "moving" or "transferring ownership" is a concept that's not really known to Python (and might not result in the best Python API?). So I guess that's the reason there's a lot of hesitation, here as well as in other projects such as Boost.Python.

However, it should (in principle) already be possible to do so, no? You could create a wrapper object (maybe even a class template), have the class/unique_ptr you want to move as a member, and write a custom type_caster? The custom type caster could be reasonably simple, basically just wrapping and unwrapping, I believe.

YannickJadoul avatar Oct 07 '20 21:10 YannickJadoul

However, it should (in principle) already be possible to do so, no?

Apologies. Apparently that's not possible, right now, and pybind11 already chokes when a function expects a rvalue reference.

YannickJadoul avatar Oct 07 '20 22:10 YannickJadoul

This does work for me, though, because you cán move from non-const lvalue references:

#include <pybind11/pybind11.h>
#include <iostream>

namespace py = pybind11;


class X {
public:
        X(int x) : m_x(x) { std::cout << "X::X(" << m_x << ")" << std::endl; }
        X(const X&) = delete;
        X(X &&other) : m_x(std::move(other.m_x)) { other.m_x = -1; std::cout << "X::X(X&&) (m_x = " << m_x << ")" << std::endl; }
        X &operator=(const X&) = delete;
        X &operator=(X&&) = default;
        ~X() { std::cout << "X::~X() (m_x = " << m_x << ")" << std::endl; }

private:
        int m_x;
};

void sink(X &&x) { X(std::move(x)); };

PYBIND11_MODULE(example, m) {
        py::class_<X>(m, "X");

        m.def("source", [](int x) { return X(x); });
        m.def("sink", [](X& x) { sink(std::move(x)); });
}
>>> import example
>>> x = example.source(42)
X::X(42)
X::X(X&&) (m_x = 42)
X::~X() (m_x = -1)
>>> example.sink(x)
X::X(X&&) (m_x = 42)
X::~X() (m_x = 42)
>>> x = None
X::~X() (m_x = -1)

I assume you could write a custom type caster checking if the X object is valid or not (i.e., in this case, whether it contains -1), and thus whether you can still call methods on it or pass it to C++ from Python.

YannickJadoul avatar Oct 07 '20 22:10 YannickJadoul

The problem is that "moving" or "transferring ownership" is a concept that's not really known to Python Honestly, I don't really get the problem. Ok, python programmers should not need to know the intricacies of move semantics. But python functions modifying their arguments do exist and are not especially non-pythonic. So here we are in presence of a function modifying its arguments so that the argument is not usable anymore because it has been consumed. I am sure that this pattern arises as soon as you need to transfer a ressouce. To sum it up : I don't think it is hard to understand that an object is not valid anymore because it has been transfered somewhere else.

BerengerBerthoul avatar Oct 07 '20 22:10 BerengerBerthoul

Maybe on a more controversial note: there is always many wrong uses of a feature. But if they are good use cases (and I think there are, if you guys are not conviced I am willing to give more details), the library should support them. If not, then the user is left to code his or her own half-baked, non-standard workaround.

BerengerBerthoul avatar Oct 07 '20 22:10 BerengerBerthoul

But python functions modifying their arguments do exist and are not especially non-pythonic.

True, yes, but there's a difference between changing an object and invalidating it? That's not exactly common in Python, no, an object that entirely invalid/untouchable?

Another issue is that Python's variables work differently to C++. Every Python variable is basically already a shared pointer (with reference counting). So by "invalidating" one variable, you're also invalidating all other variables pointing to this same object. Yes, not insurmountable to explain, especially if it's can be checked and a nice error can be shown, but there's really a few places where C++'s memory management just doesn't fit with Python and vice versa :-/

Maybe on a more controversial note: there is always many wrong uses of a feature. But if they are good use cases (and I think there are, if you guys are not conviced I am willing to give more details), the library should support them. If not, then the user is left to code his or her own half-baked, non-standard workaround.

I might agree here, though encouraging good design by default is still a good plan and will take away questions ;-) I'm not saying you're wrong, btw, or that this can never be part of pybind11; merely explaining why it's been a tricky thing to have in pybind11 (and one where a custom solution might be advisable, actually, since it'll often depend on the actual application on how to handle this?).

But I'm curious about your use-case, yes. I found that often, there's a nice workaround that results in a more "pythonic" API.

Btw, you're definitely not the first ones to come up with this problem. I know that for example @EricCousineau-TRI has played around with this before (see e.g., #1146, I believe? Lots of further discussion there, almost ancient history by now, for pybind11's lifetime).

YannickJadoul avatar Oct 07 '20 23:10 YannickJadoul

[...] I know that for example @EricCousineau-TRI has played around with this before (see e.g., #1146, I believe? Lots of further discussion there, almost ancient history by now, for pybind11's lifetime).

FWIW We're still using a flavor of #1146 in our fork: https://github.com/RobotLocomotion/pybind11/blob/58a368ea8e89638e01a7385cad9ce4345d70003d/README_DRAKE.md

I can't say it was the worst decision to allow moving unique_ptr<T> from Python to C++, but I can't say it's the best either. If you care to see some of the nuances, see: https://github.com/RobotLocomotion/drake/issues/14016 https://github.com/RobotLocomotion/drake/issues/13058 Here's a concrete indicator of invalidation (due to discrepancy btw C++ and Python memory management, as Yannick alluded to) https://github.com/RobotLocomotion/drake/blob/387c0f9a52e6bc7ceafe067d6e7f1c42fba4e650/bindings/pydrake/common/value_pybind.h#L73-L77

[...] Ok, python programmers should not need to know the intricacies of move semantics. [...]

For the case of unique_ptr, at least our implementation does - we have to (kind of poorly) tell users that "hey, Python doesn't own this, so Python can't hand it over to C++".

EricCousineau-TRI avatar Oct 07 '20 23:10 EricCousineau-TRI

To the point of this PR, though, it seems pretty straightforward at first blush.

The net gain I see here is that users don't have to write odd wrappers to bind functions with rvalues; the fallout of move semantics does seem like it'd be the downstream dev's responsibility, which IMO are foot-guns that are still available in nominal pybind? (e.g. .def("delete", [](T* x) { delete x; })).

I'd be happy to review (albeit slowly), and see if we can try to create simulated edge cases that we could encode in unittests (e.g. if there ever is a point where a Python program is affected by move semantics).

(I do think that we should de-scope supporting anything like unique_ptr in this PR, though)

EricCousineau-TRI avatar Oct 07 '20 23:10 EricCousineau-TRI

Oh, upon reading #2040, it looks like the goal is to handle unique_ptr. On that point, yeah, I'd say this may be a more tad complex. Will comment in that issue to try and see some more deets.

EricCousineau-TRI avatar Oct 07 '20 23:10 EricCousineau-TRI

First, I am very happy to see that people are responding quickly to the comments, it's good to have an active community!

I actually missed this line from @YannickJadoul m.def("sink", [](X& x) { sink(std::move(x)); }); I think it solves the problem in my case. @rhaschke isn't it enought for your use case?

True, yes, but there's a difference between changing an object and invalidating it? That's not exactly common in Python, no, an object that entirely invalid/untouchable?

Well actually, a moved-from object is not invalid but in a "valid but unspecified state". So you can call any method that does not requires an invariant. E.g. you can call size() on a moved-from std::vector: the result is unspecified but this is not undefined behavior. On the other hand, it seems practical to just think of the object as invalid. Just to say that the line between invalid and modified is not that well defined.

you're also invalidating all other variables pointing to this same object

Yes, but same thing if you are resizing an array to 0, this may be surprising to the other variables. I think this is conceptually the same usual kind of behavior

But I'm curious about your use-case, yes. I found that often, there's a nice workaround that results in a more "pythonic" API.

My use case is the following : I have a CFD problem where I am building the configuration step by step in Python. The classes describing the configuration itself are in C++. So let's say I want to build the initial solution fields inside a FlowSolution object, and then transfer the FlowSolution to the Configuration object. Then I want to use a move since the object is heavy, and I don't plan to use it anymore (if it were to be the case, I could call a getter on Configuration)

BerengerBerthoul avatar Oct 09 '20 14:10 BerengerBerthoul

@BerengerBerthoul Since Yannick's workaround works for you, it seems like this PR would valuable to you (you're not expecting unique_ptr to work in this instance). In that light, I'd be happy for this PR to move forward as-is (no unique_ptr stuff).

@rhaschke Would you be up for moving this forward w/o anything explicitly for unique_ptr? If you don't have time, I'm happy to move it forward.

EricCousineau-TRI avatar Oct 12 '20 17:10 EricCousineau-TRI

So I guess that's the reason there's a lot of hesitation, here as well as in other projects such as Boost.Python.

Move semantics didn't exist when Boost.Python was written (2000-2003 timeframe).

rwgk avatar Oct 15 '20 02:10 rwgk

Move semantics didn't exist when Boost.Python was written (2000-2003 timeframe).

Right, of course! Thanks for that reminder. So yes, maybe this can be a project towards 2.7.0? Maybe we can make this safer with some opt-in flag, or so?

YannickJadoul avatar Oct 15 '20 09:10 YannickJadoul

@EricCousineau-TRI, I would be happy to move this forward, although, eventually, I would need support for moving unique_ptrs. I think the key question is how to tell the user that an instance became invalid... @YannickJadoul: What do you think is missing to progress with this PR?

rhaschke avatar Oct 19 '20 13:10 rhaschke

@rhaschke For me on this PR, I think if you edit the overview to note that this is an independent, strict improvement towards #2040 (and thus narrow the scope of this specific PR), and rebase, I'd be happy to review it.

Will let Yannick chime in as well.

EricCousineau-TRI avatar Oct 19 '20 16:10 EricCousineau-TRI

I think the key question is how to tell the user that an instance became invalid... [...]

My point about scoping is that this is irrelevant to pybind11, other than just adding some minor caution tape saying "If you rely on move semantics in your publicly exposed Python-C++ API, you should take care in your own API that invalidated objects have clear error messages for users (e.g. do not cause crazy segfaults)."

EricCousineau-TRI avatar Oct 19 '20 16:10 EricCousineau-TRI

@YannickJadoul: What do you think is missing to progress with this PR?

For me, to decently review this, I'd need a few hours (after 2.6.0 is out, when I have some time) to fully get all the details of the type casters again. E.g., this first change of detail::cast_op_type<T>; into detail::movable_cast_op_type<T>; feels ... well, I don't feel confident to say a lot about this without studying and reminding myself about the whole casting mechanism again. There are a few other problems with the casters (e.g., #2245/#2303/#2527, or #2336), so casters might need a redesign once we start aiming at 3.0, anyway.

Apart from that, stealing objects from Python is rather tricky (given the way Python treats objects), so once there is something, I would like some input from @wjakob on this. Of course, once you allow objects to be stolen from pybind11, maybe there isn't a big difference anymore with stealing the holders. At the same time, stealing objects is already possible, since you can get a reference to the object? (not sure if that's an argument for or against this PR :-) )

I can see this can all be useful for power users and a bit of a gap between C++ and Python, but at the same time, I still don't believe it's a great Python API, so making this opt-in and slightly harder than necessary might push users towards creating more Python-like APIs. Any thoughts on that?

YannickJadoul avatar Oct 19 '20 17:10 YannickJadoul

I can see this can all be useful for power users and a bit of a gap between C++ and Python

Hi @YannickJadoul , I think the motivation behind this PR, and #2046, goes much deeper. See also my question #2583.

so making this opt-in and slightly harder than necessary might push users towards creating more Python-like APIs. Any thoughts on that?

I actually started out wanting to eliminate the unique_ptr passing feature from PyCLIF. But when I ran into the concrete use case from which #2583 is distilled, I developed serious doubts. Short term, it would probably take me at least a week or two to change the existing Google code that needs the unique_ptr passing, and I expect unhappy customers immediately, because the Python wrappers force them to deviate from modern C++ best practices. Long-term, I expect a steady trickle of new unhappy customers, especially if adding Python bindings is an afterthought, but forces them to rethink their original C++ design.

About disadvantages, I can see two: 1. an extra if to check if the held object was invalidated (runtime hit, although my guess is it's in the noise). 2. it could be difficult to debug where in a process the held object was invalidated, in case that's unexpected.

If I balance those disadvantages with the advantage of enabling wrapping of modern C++ code, I come out strongly on the "it's worth it" side, but I also think your "harder than necessary" idea is a good one. Maybe it could even be leveraged to address the 2. disadvantage above, via something like

function_expecting_unique_ptr(obj)  # easy, but ...
function_expecting_unique_ptr(obj.disown())  # harder than necessary, but flags code that gives up ownership

In the C++ bindings we'd need some mechanism to add the special .disown() method, and the casters would have to reject a plain obj but accept obj.disown().

rwgk avatar Oct 19 '20 19:10 rwgk

I rebased onto recent master. I agree with @YannickJadoul's doubts expressed in https://github.com/pybind/pybind11/pull/2047#issuecomment-712325915: Changing detail::cast_op_type<T> into detail::movable_cast_op_type<T> wasn't a good idea as this makes moving preferred over copying as indicated in https://github.com/pybind/pybind11/pull/2047#issuecomment-569857805 already. I added a new unit test illustrating the problem: https://github.com/pybind/pybind11/pull/2047/commits/60ef62c211a42f6dd947e0f8e1fb32477c4b09af

I think the casters should be reworked to consider the actual argument type (pointer, lvalue reference, rvalue reference, constness?). As the argument casters only know about the intrinsic type, this detailed information is already lost, although the casting behavior should differ for these type flavors. Playing a little with the casting operators here, I couldn't find a working solution.

rhaschke avatar Oct 20 '20 22:10 rhaschke

@rhaschke, yeah, it seems to be one of the things in pybind11 that grew a bit out of control :-( There hás to be some way to fix it, though? Worst case, it'll will involve some compile-time macro opting-in to the new approach.

For the record: I'm not per se against adding/enabling these moves; it would make pybind11 more complete, in some way. But I'm just voicing the reasons why this is currently not really supported and why we should be really careful here. I do very much get @wjakob's carefulness/hesitance, when it comes to this.

YannickJadoul avatar Oct 20 '20 22:10 YannickJadoul

I completely reworked this PR to address the remaining issue that moving was always preferred over copying regardless of the actual function signature.

  1. The main reason is that cast_op(caster&&) is enforcing move semantics by always adding an rvalue reference as soon as the caster itself is an rvalue reference: https://github.com/pybind/pybind11/blob/b72cebeb228e839caa13fa0269bd15676e130def/include/pybind11/cast.h#L953-L957 The alternative cast_op(caster&) implementation that is selecting the move/copy semantics based on the actual function signature, was essentially disabled in #851 by enforcing rvalue references for all argument casters. This PR reenables the lvalue-reference cast_op() for calling C++ functions, thus allowing to chose between move and copy semantics based on the function argument type:

  2. Further, it was important to actually enable moving if move semantics is chosen. The original implementation returning an rvalue reference to the internal value of the caster wasn't actually performing the move as can be seen from the failing unit tests before 0412b945cfe5c9b88b6696a0e32dae5525c17144. Instead of casting to an rvalue reference, I'm now casting to T and explicitly performing the move. The same technique was already used to implement move semantics for tuple_caster: https://github.com/pybind/pybind11/blob/b72cebeb228e839caa13fa0269bd15676e130def/include/pybind11/cast.h#L1438-L1446

As before, due to preferring copy over move semantics for plain T arguments, I had to adapt some ConstructorStats in the unit tests. There are some other adaptions to the unit tests as well, which I am not yet sure about. Furthermore, there are some failing CI tests... So there is still some work to do.

rhaschke avatar Nov 09 '20 20:11 rhaschke

I shortly looked into a few failing tests. One issue seems to be a discrepancy between gcc and clang: While my local clang was succeeding, gcc is failing on the itype() cast for abstract types:

pybind11/cast.h:925:5: error: invalid abstract return type ‘test_submodule_callbacks(pybind11::module_&)::AbstractBase’
  925 |     operator itype() { if (!value) throw reference_cast_error(); return itype(std::move(*((itype *) value))); }
      |     ^~~~~~~~
pybind11/tests/test_callbacks.cpp:120:11: note:   because the following virtual functions are pure within ‘test_submodule_callbacks(pybind11::module_&)::AbstractBase’:
  120 |     class AbstractBase {
      |           ^~~~~~~~~~~~
pybind11/tests/test_callbacks.cpp:123:30: note: 	‘virtual unsigned int test_submodule_callbacks(pybind11::module_&)::AbstractBase::func()’
  123 |         virtual unsigned int func() = 0;

I have no idea how to handle this. Any help is highly appreciated.

rhaschke avatar Nov 09 '20 21:11 rhaschke

@rhaschke Can you perhaps enable_if this part with std::is_abstract?

(Note: I haven't actually looked at the code; so I don't know why it worked before and is failing now. Important to check, once this gets reviewed.)

YannickJadoul avatar Nov 09 '20 21:11 YannickJadoul

@rhaschke Can you perhaps enable_if this part with std::is_abstract?

Yes, I tried. Didn't work for me. But I'm not an expert on the SFINAE syntax. Would be great if somebody more experienced could have a look.

Note: I haven't actually looked at the code; so I don't know why it worked before and is failing now.

The reason is that I changed the cast operator from operator T&&() to operator T() to enable actual moving as explained in 2. in https://github.com/pybind/pybind11/pull/2047#issuecomment-724256796.

rhaschke avatar Nov 09 '20 21:11 rhaschke

@YannickJadoul: I think I fixed the SFINAE syntax and https://github.com/pybind/pybind11/pull/2047/commits/2cb13e6db54f3b982f214bc145da4510b74a6e59 compiles on gcc. However, it fails on clang. Removing the protection works on clang, but fails on gcc (https://github.com/pybind/pybind11/pull/2047/commits/cadc0c1cf1bfca364472abb66cc250200612afdc). Do you have any advice how to proceed?

The good news: otherwise the PR seems to work as expected. I still need to validate the adaptions to the ConstructorStats in the unittests (https://github.com/pybind/pybind11/pull/2047/commits/2cb13e6db54f3b982f214bc145da4510b74a6e59).

rhaschke avatar Nov 12 '20 00:11 rhaschke