cython icon indicating copy to clipboard operation
cython copied to clipboard

[ENH] Add warning/error for using std::move on Python object

Open dgrunwald opened this issue 4 years ago • 5 comments

In modern C++, it's quite normal to use std::move on strings to avoid unnecessary copies. It's possible that a developer accidentally calls move on a Python string object before conversion to a C++ string:


from libcpp.string cimport string
from libcpp.utility cimport move
from libcpp.vector cimport vector

def test(s):
    cdef vector[string] v
    v.push_back(move(s))

In C++, redundant moves (e.g. when moving a char*) are harmless. But in Cython, the snippet above causes a reference counting error: Cython assumes that move returns an owned reference and calls Py_DECREF on it after the conversion to a C++ string. This ends up causing subtle corruption of the Python heap -- in our case, we ended up observing non-deterministic crashes during garbage collection.

Describe the solution you'd like I'm hoping it's possible to add a compiler error for libcpp.utility.move[PyObject*] -- calling this function is always undefined behavior as it doesn't perform the Py_INCREF expected for functions of its signature.

Additional context I realize that Cython works "as designed" here. But it's too easy to mess this up when C++ strings should use move and Python strings must not use move. Given how common std::move is in modern C++ and how difficult it is to debug the crashes resulting from this mistake, I think it's worth adding a special case to the compiler.

dgrunwald avatar Nov 05 '21 13:11 dgrunwald

It's probably a reasonable thing to add (especially since we already define a small wrapper function around std::move). Maybe just = delete the PyObject* overload?

da-woods avatar Nov 08 '21 22:11 da-woods

This is the C++ code that I get on my side:

  /* "cpp_move.pyx":7
 * def test(s):
 *     cdef vector[string] v
 *     v.push_back(move(s))             # <<<<<<<<<<<<<<
 */
  __pyx_t_1 = cython_std::move<PyObject *>(__pyx_v_s); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_2 = __pyx_convert_string_from_py_std__in_string(__pyx_t_1); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  try {
    __pyx_v_v.push_back(__pyx_t_2);
  } catch(...) {
    __Pyx_CppExn2PyErr();
    __PYX_ERR(0, 7, __pyx_L1_error)
  }

Apparently, Cython assumes that the return type of move(object) is a Python object. That is correct, given the templated signature of the function. As you noted, however, it is in fact incorrect to call move() on a Python object in the first place. It's ok to do that with a PyObject*, or any C/C++ type, really. It is not ok to pass an owned object reference through move().

move() is probably amongst the most special in that regard, but are there really use cases for passing owned references into C++ (standard) functions? As opposed to plain PyObject* ? I'm wondering if we should just always cast to PyObject* automatically, at least for undefined template typed arguments. Is it really reasonable to define a C++ function that can work with an arbitrary template type (and returns it), that works equally well with regular C++ types and owned Python references? Would you not define an overloaded function in that case, with a dedicated object->object signature?

scoder avatar Nov 09 '21 08:11 scoder

but are there really use cases for passing owned references into C++ (standard) functions? As opposed to plain PyObject* ? I'm wondering if we should just always cast to PyObject* automatically, at least for undefined template typed arguments

We use object arguments all over the place in our wrapping of the Python C API so we don't want to break owned references entirely. However, I agree that it's dodgy for the compiler to deduce an "owned reference" template argument since it's pretty unlikely to automatically be right. I'd be fine with allowing move[object](o) (i.e. the letting the user manually specify the template argument but not deducing it automatically).

Relatedly I was proposing to ban object (and other reference counted types) as a container template argument (https://github.com/cython/cython/pull/4337) because there were too many people saying "I created a vector of memorviews and now everything is segfaulting".

So yes - something similar to that seems reasonable to me.

da-woods avatar Nov 09 '21 18:11 da-woods

C++ template argument deduction involves decay (array-to-pointer / function-to-pointer / dropping cv-qualifiers); I think it makes sense if Cython template argument deduction adds object-to-PyObject* to that list.

dgrunwald avatar Nov 09 '21 18:11 dgrunwald

So is the correct thing to do in this case when passing a python object into std::move is to specify that the template parameter is PyObject*, i.e. std::move[PyObject*](o)?

jymchng avatar Dec 03 '23 10:12 jymchng