The same code works on Nvidia CUDA but doesn't work on AMD Vulkan
Describe the bug When I run a Taichi kernel with Nvidia CUDA, it always works well. However, when I try it with AMD Vulkan, it will fail.
To Reproduce Here is a sample code.
import time
import numpy as np
import taichi as ti
ti.init(arch=ti.gpu)
@ti.kernel # Test kernel
def test_kernel(a: ti.types.ndarray(ndim=3), b: ti.types.ndarray(ndim=3)):
nslices, nrow, ncols = a.shape
for sli, row, col in b:
for n in range(nslices):
b[sli, row, col] += a[n, row, col]
b[sli, row, col] /= nslices
m = 64 # Test data size
m = 512
a = np.random.random((m, m, m)).astype('float32')
b = np.zeros_like(a)
t1 = time.time()
test_kernel(a, b)
t2 = time.time()
print(f'Time: {t2-t1}s')
Log/Screenshots
For Nvidia CUDA, here is the result:
When m = 64,
[Taichi] version 1.7.0, llvm 15.0.1, commit 2fd24490, win, python 3.10.10
[Taichi] Starting on arch=cuda
Time: 0.049997806549072266s
When m = 512,
[Taichi] version 1.7.0, llvm 15.0.1, commit 2fd24490, win, python 3.10.10
[Taichi] Starting on arch=cuda
Time: 1.6580820083618164s
For AMD Vulkan, here is the result:
When m = 64, it works.
[Taichi] version 1.7.1, llvm 15.0.1, commit 0f143b2f, win, python 3.10.6
[Taichi] Starting on arch=vulkan
Time: 0.040009260177612305s
However, when m = 512, it fails.
'C:\x5cUsers\x5cMango\x5cDesktop\x5cWFH\x5ctest.py' ;949194d8-2dc7-4409-bbea-16e3abdba9d4[Taichi] version 1.7.1, llvm 15.0.1, comm[W 07/27/24 19:56:04.137 17488] [cuda_driver.cpp:taichi::lang::CUDADriverBase::load_lib@36] nvcuda.dll lib not found.
RHI Error: (0) Vulkan device might be lost (vkQueueSubmit failed)
Assertion failed: false && "Error without return code", file C:\Users\buildbot\actions-runner\_work\taichi\taichi\taichi\rhi\vulkan\vulkan_device.cpp, line 2038
If I try it again, my screen goes black. I have to reboot my computer.
Additional comments Nvidia version: GeForce RTX 2080 SUPER 8GB AMD version: Radeon RX 7700 XT 12 GB
Can you try to use rocm backend on AMDGPU if you are using linux?
Can you try to use rocm backend on AMDGPU if you are using linux?
Sorry, I don't have linux. I only use Windows.