pybind11 icon indicating copy to clipboard operation
pybind11 copied to clipboard

[QUESTION] Numpy Array to C++ - take ownership of data

Open MartinPerry opened this issue 4 years ago • 10 comments

Is it possible to pass data from numpy to C++ and take ownership of the memory, so its no longer managed by Python? I have large Numpy matrix and I dont want to copy memory. I can use py::buffer_info and get pointer to the data, but the pointer is not valid when Python is shut down. Another reason is I want to release data from C++ side once I no longer need them.

MartinPerry avatar Jul 16 '21 14:07 MartinPerry

If you don't want to copy use PYBIND11_MAKE_OPAQUE(). You can read about it here: https://pybind11.readthedocs.io/en/stable/advanced/cast/stl.html#making-opaque-types

jiwaszki avatar Jul 18 '21 14:07 jiwaszki

Is it possible to allocate the memory on the C++ side and pass it back to Python?

You can return a buffer from C++ to python like this:

return pybind11::buffer_info(...)

On the Python side, this return value can be used directly as a numpy array.

petrochemical avatar Jul 19 '21 02:07 petrochemical

If you don't want to copy use PYBIND11_MAKE_OPAQUE(). You can read about it here: https://pybind11.readthedocs.io/en/stable/advanced/cast/stl.html#making-opaque-types

I am not quite sure, how to use this with numpy. Do you have some example?

In python I have a very simple example:

import numpy as np
arr = np.array([1, 2, 3, 4, 5])

and I want to pass arr to C++ so that C++ will "own" the data and data wont be freed when Python interpreter is finalized. Also, I dont want to create copy of data


Is it possible to allocate the memory on the C++ side and pass it back to Python?

You can return a buffer from C++ to python like this:

return pybind11::buffer_info(...)

On the Python side, this return value can be used directly as a numpy array.

I cannot init memory in C++ via buffer_info, because I have numpy array as output from other library.

MartinPerry avatar Jul 19 '21 10:07 MartinPerry

@MartinPerry I have the same exact question, did you manage to solve your issue?

PierreMarchand20 avatar Aug 23 '21 17:08 PierreMarchand20

@PierreMarchand20 Unfortunately, no

MartinPerry avatar Aug 24 '21 05:08 MartinPerry

@MartinPerry @PierreMarchand20 I have the same exact question, did you manage to solve your issue?

ptbxzrt avatar Mar 17 '22 08:03 ptbxzrt

I may have figured out a solution to this! It looks like the array pointer remains valid as long as the py::buffer_info object returned by buffer.request() exists. I've written a simple wrapper for transferring a 1D NumPy array into C++ (without copying) by storing the buffer_info object in an instance variable:

template<typename T>                                  
struct PyArray {                                      
                                                      
    py::buffer_info info;                             
    T *data;                                          
    size_t size;                                      
                                                      
    PyArray(py::array_t<T> arr) :                     
        info { arr.request() },                       
	data { static_cast<T*>(info.ptr) },           
	size { static_cast<size_t>(info.shape[0]) } {}
                                                      
    PyArray(const PyArray &) = delete;                
    PyArray &operator=(const PyArray &) = delete;     
    PyArray(PyArray&&) = default;                     
	
    //...

Note that py::buffer_info is not copyable, so I had to delete the copy constructors and define the move constructor in PyArray. This does limit how PyArray can be used, but it should work as long as you always pass by reference.

I've tested this by creating a NumPy array in Python, using it to initialize a PyArray, deleting the original NumPy array, and then confirming that the PyArray still works. This same test fails if info is local to the constructor. I'm no expert in Python memory management, so I'm not 100% sure it this will work in all circumstances (e.g. when Python is "shut down"), but hope it helps!

skovaka avatar May 09 '22 16:05 skovaka

I had similar problem. We typically use python in our C++ code base in the following way:

py::scoped_interpreter guard{};
py::dict locals;

py::exec(R"(python code here)", py::globals(), locals);

In order to move data from numpy ndarray calculated in python snippet to C++ I wrote the following function:

template <typename T>
arma::Col<T> MoveFromNumpyArray(pybind11::object obj) {
  // Cannot use dynamic cast here because there are no virtual functions in
  // pybind interface
  auto np_array = static_cast<pybind11::array>(obj);
  // In order to correctly extract data from numpy array its data type should be
  // the same as T
  assert(np_array.dtype() == pybind11::dtype::of<T>());
  auto* data_ptr = static_cast<T*>(np_array.mutable_data());
  assert(np_array.size() >= 0);
  auto size = static_cast<arma::uword>(np_array.size());
  np_array.release();
  return {data_ptr, size, /*copy_aux_memory=*/false,
          /*strict=*/false};
}

In my case common use of the function will be:

py::scoped_interpreter guard{};
py::dict locals;

py::exec(R"(
import numpy as np
x = np.array((1,2,3,4), dtype=int)
)", py::globals(), locals);

auto data = MoveFromNumpyArray<int>(local["x"]);

Therefore I need to cast from pybind11::object into pybind11::array and hope that user will pass numpy array as argument. Also it is very important that user will specialize template to the type, which corresponds to numpy.ndarray.dtype.

arma::Col<T> in current example is basically std::vector<T>, which has constructor for building itself on top of the given pointer without any copy.

The solution is not ideal, because it relies on the user in two crucial things, but it works. At leas on my tests)

bogdan-lab avatar Sep 02 '22 13:09 bogdan-lab

I may have figured out a solution to this! It looks like the array pointer remains valid as long as the py::buffer_info object returned by buffer.request() exists. I've written a simple wrapper for transferring a 1D NumPy array into C++ (without copying) by storing the buffer_info object in an instance variable:

template<typename T>                                  
struct PyArray {                                      
                                                      
    py::buffer_info info;                             
    T *data;                                          
    size_t size;                                      
                                                      
    PyArray(py::array_t<T> arr) :                     
        info { arr.request() },                       
	data { static_cast<T*>(info.ptr) },           
	size { static_cast<size_t>(info.shape[0]) } {}
                                                      
    PyArray(const PyArray &) = delete;                
    PyArray &operator=(const PyArray &) = delete;     
    PyArray(PyArray&&) = default;                     
	
    //...

Note that py::buffer_info is not copyable, so I had to delete the copy constructors and define the move constructor in PyArray. This does limit how PyArray can be used, but it should work as long as you always pass by reference.

I've tested this by creating a NumPy array in Python, using it to initialize a PyArray, deleting the original NumPy array, and then confirming that the PyArray still works. This same test fails if info is local to the constructor. I'm no expert in Python memory management, so I'm not 100% sure it this will work in all circumstances (e.g. when Python is "shut down"), but hope it helps!

Hello, Thanks for the method. I try this method but I make a new numpy array after delete the old one, PyArray::data is replaced by the new array. Have you tried this?

songh11 avatar Sep 08 '22 07:09 songh11

I have the same exact question. How can I malloc a buffer from c++ and use it On the Python side?

songh11 avatar Sep 08 '22 07:09 songh11