Error when convert a large taichi ndarray to numpy using to_numpy() method
I have a taichi ndarray, and need to convert it to numpy. The method to_numpy() is used. An error occured when the ndarray is very large :
RuntimeError: [taichi/rhi/cuda/cuda_driver.h:taichi::lang::CUDADriverFunction<void *>::operator ()@92] CUDA Error CUDA_ERROR_ASSERT: device-side assert triggered while calling stream_synchronize (cuStreamSynchronize)
It seems that the to_numpy() method needs another memory which is about the same size of ndarray. A sample test code is below:
ti.init(arch=ti.gpu, device_memory_GB=0.9)
# img will occpy 0.5 GB memory
n = 512
img = ti.ndarray(ti.f32, shape=(n, n, n))
a = img.to_numpy() # Error
If I set device_memory_GB=1.1, there won't be any error. Therefore, I guess to_numpy() needs double memory of ndarray in total. However, my data is 4 GB, and GPU memory is 6 GB, which cannot execute to_numpy(). Is there any solution to solve my problem?
Additional information: Python version: 3.10.10 Taichi version: 1.6.0 CUDA version: 12.0