onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

Run model with a cupy array on CUDA

Open james77777778 opened this issue 3 years ago • 4 comments
trafficstars

Similar to #10217

Can we run onnxruntime model with cupy array (with some conversions)? I tried the dlpack way as mentioned in #4162 but I got module not found error with the statement C.OrtValue.from_dlpack()

Have I missed something or the installation is different as usual to support from onnxruntime.capi import _pybind_state as C?

I basically install onnxruntime as following:

# Ubuntu 18.04 with cuda 11.2
pip install onnxruntime-gpu==1.9.0

Thanks!

james77777778 avatar Jan 11 '22 05:01 james77777778

Hi, I am facing the same Issue. Have you figured out how to use from_dlpack() and to_dlpack() with ONNX?

ManuelAngel99 avatar Jan 14 '22 04:01 ManuelAngel99

After some research I found a plausible solution. Take a look at #10286 as you may need to build onnxruntime from source. Once you have everything working, you will find the function you were looking for in onnxruntime.training.ortmodule._utils:

from onnxruntime.training.ortmodule._utils import from_dlpack, to_dlpack

ManuelAngel99 avatar Jan 14 '22 21:01 ManuelAngel99

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

stale[bot] avatar Apr 16 '22 07:04 stale[bot]

I also came across this task here and found that the solution is quite simple: You have to use IO Bindings and just pass the cupy data pointer. I've also made sure that the array is contiguous, because the PyTroch example also shows this. Also there is a nice cupy interoperability guide which I have used.
Here is an example, where a cupy array is turned in, and a numpy is returned:

image_gpu = cp.array(image, dtype=cp.float32)
image_gpu = cp.ascontiguousarray(image_gpu)

binding = onnx_sess.io_binding()
binding.bind_input(name=onnx_sess.get_inputs()[0].name, device_type='cuda', device_id=0, element_type=cp.float32,
                   shape=tuple(image_gpu.shape), buffer_ptr=image_gpu.data.ptr)
binding.bind_output(name=onnx_sess.get_outputs()[0].name)
onnx_sess.run_with_iobinding(binding)
results = binding.copy_outputs_to_cpu()[0]


Here is an example, where only cupy arrays are used:
image_gpu = cp.array(image, dtype=cp.float32)
image_gpu = cp.ascontiguousarray(image_gpu)

binding = onnx_sess.io_binding()
binding.bind_input(name=onnx_sess.get_inputs()[0].name, device_type='cuda', device_id=0, element_type=cp.float32,
                   shape=tuple(image_gpu.shape), buffer_ptr=image_gpu.data.ptr)
binding.bind_output("output", "cuda")

onnx_sess.run_with_iobinding(binding)

if ort_output.data_ptr():
  ort_output = binding.get_outputs()[0]  # returns OrtValue with memory pointer
  mem = cp.cuda.UnownedMemory(ort_output.data_ptr(), np.prod(onnx_sess.get_outputs()[0].shape), owner=ort_output)
  mem_ptr = cp.cuda.MemoryPointer(mem, 0)
  results = cp.ndarray(ort_output.shape(), dtype=cp.float32, memptr=mem_ptr)

I guess also the feature request can thus be closed #15963 Adding this to the docs in the examples would be nice - cupy is awesome regarding pre processing of images on the GPU.

The code above was tested using onnxruntime 1.8.0, cupy 9.6.0 and cuda 11.0.


edit on 26th February: always check if the ort-value pointer is non-zero!

monzelr avatar Feb 24 '24 14:02 monzelr

I have been getting corrupted results on some calls with the solution above when doing repeated calls to the model, and after a day trying to solve the issue I found by chance that one needs to call binding synchronize_inputs() before calling onnx_sess.run_with_iobinding(binding) to avoid that. This does not seem to be documented anywhere :(. I am not sure one needs to also call io_binding.synchronize_outputs(). Also it seems one may need to use results = cp.ndarray(ort_output.shape(), dtype=cp.float32, memptr=mem_ptr).copy() in the example above to avoid getting the data deleted if/when ort_output gets out of scope.

martinResearch avatar Apr 26 '24 18:04 martinResearch