taichi
taichi copied to clipboard
RhiResult(-3) error when running NGP example on Windows with Vulkan
Describe the bug
I get a Dispatch error : RhiResult(-3)
when trying to run taichi_ngp.py on Windows using Vulkan.
To Reproduce Tried running the script taichi_ngp.py on Windows 11 with python 3.10.9 and taichi 1.5.0. CPU: Ryzen 9 7900X GPU: AMD 7900XT
Log/Screenshots
(taichi) C:\Users\armin\projects\external\taichi>C:/Users/armin/Programs/miniconda3/envs/taichi/python.exe c:/Users/armin/projects/external/taichi/python/taichi/examples/rendering/taichi_ngp.py
[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9
[W 03/28/23 14:37:12.283 28600] [cuda_driver.cpp:taichi::lang::CUDADriverBase::load_lib@36] nvcuda.dll lib not found.
[Taichi] Starting on arch=vulkan
Loading model from ./npy_models/lego.npy
[E 03/28/23 14:37:12.541 28600] [runtime.cpp:taichi::lang::gfx::GfxRuntime::launch_kernel@576] Dispatch error : RhiResult(-3)
Traceback (most recent call last):
File "c:\Users\armin\projects\external\taichi\python\taichi\examples\rendering\taichi_ngp.py", line 1111, in <module>
main(cmd_args)
File "c:\Users\armin\projects\external\taichi\python\taichi\examples\rendering\taichi_ngp.py", line 1081, in main
ngp.load_model(npy_file)
File "c:\Users\armin\projects\external\taichi\python\taichi\examples\rendering\taichi_ngp.py", line 332, in load_model
self.hash_embedding.from_numpy(
File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\util.py", line 311, in wrapped
return func(*args, **kwargs)
File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\field.py", line 357, in from_numpy
self._from_external_arr(arr)
File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\util.py", line 311, in wrapped
return func(*args, **kwargs)
File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\field.py", line 347, in _from_external_arr
ext_arr_to_tensor(arr, self)
File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\kernel_impl.py", line 1023, in wrapped
return primal(*args, **kwargs)
File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\kernel_impl.py", line 950, in __call__
return self.runtime.compiled_functions[key](*args)
File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\kernel_impl.py", line 853, in func__
raise e from None
File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\kernel_impl.py", line 850, in func__
t_kernel(launch_ctx)
RuntimeError: [runtime.cpp:taichi::lang::gfx::GfxRuntime::launch_kernel@576] Dispatch error : RhiResult(-3)
Additional comments
(taichi) C:\Users\armin\projects\external\taichi>ti diagnose
[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9
*******************************************
** Taichi Programming Language **
*******************************************
Docs: https://docs.taichi-lang.org/
GitHub: https://github.com/taichi-dev/taichi/
Forum: https://forum.taichi.graphics/
Taichi system diagnose:
python: 3.10.9 | packaged by conda-forge | (main, Jan 11 2023, 15:15:40) [MSC v.1916 64 bit (AMD64)]
system: win32
executable: C:\Users\armin\Programs\miniconda3\envs\taichi\python.exe
platform: Windows-10-10.0.22621-SP0
architecture: 64bit WindowsPE
uname: uname_result(system='Windows', node='Armin-PC', release='10', version='10.0.22621', machine='AMD64')
locale: en_AT.cp1252
PATH: C:\Users\armin\Programs\miniconda3\envs\taichi;C:\Users\armin\Programs\miniconda3\envs\taichi\Library\mingw-w64\bin;C:\Users\armin\Programs\miniconda3\envs\taichi\Library\usr\bin;C:\Users\armin\Programs\miniconda3\envs\taichi\Library\bin;C:\Users\armin\Programs\miniconda3\envs\taichi\Scripts;C:\Users\armin\Programs\miniconda3\envs\taichi\bin;C:\Users\armin\Programs\miniconda3\condabin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Windows\System32\OpenSSH;C:\Program Files\Git\cmd;C:\Users\armin\AppData\Local\Microsoft\WindowsApps;C:\Users\armin\AppData\Local\gitkraken\bin;C:\Users\armin\AppData\Local\Programs\Microsoft VS Code\bin;C:\Users\armin\Programs\miniconda3;C:\Users\armin\Programs\miniconda3\Scripts;C:\Users\armin\Programs\miniconda3\Library\bin;.;C:\Users\armin\AppData\Local\Programs\Julia-1.8.5\bin;C:\Users\armin\Programs\miniconda3\envs\taichi\Lib\site-packages\taichi\_lib
PYTHONPATH: ['C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\Scripts\\ti.exe', 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\python310.zip', 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\DLLs', 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\lib', 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi', 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\lib\\site-packages']
`lsb_release` not available: [WinError 2] The system cannot find the file specified
import: <module 'taichi' from 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\lib\\site-packages\\taichi\\__init__.py'>
cc: False
cpu: True
metal: False
opengl: True
[W 03/28/23 14:28:12.986 29280] [cuda_driver.cpp:taichi::lang::CUDADriverBase::load_lib@36] nvcuda.dll lib not found.
cuda: False
vulkan: True
`glewinfo` not available: [WinError 2] The system cannot find the file specified
`nvidia-smi` not available: [WinError 2] The system cannot find the file specified
[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9
[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9
[Taichi] Starting on arch=x64
[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9
[Taichi] Starting on arch=opengl
[W 03/28/23 14:28:15.777 32208] [cuda_driver.cpp:taichi::lang::CUDADriverBase::load_lib@36] nvcuda.dll lib not found.
[W 03/28/23 14:28:15.778 32208] [misc.py:adaptive_arch_select@772] Arch=[<Arch.cuda: 5>] is not supported, falling back to CPU
[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9
[Taichi] Starting on arch=x64
[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9
*******************************************
** Taichi Programming Language **
*******************************************
Docs: https://docs.taichi-lang.org/
GitHub: https://github.com/taichi-dev/taichi/
Forum: https://forum.taichi.graphics/
TAICHI EXAMPLES
+-----------------------------------------------------------------------------+
| 0: ad_gravity | 25: laplace | 50: physarum |
| 1: circle_packing_image | 26: laplace_equation | 51: |
| | | poisson_disk_sampling |
| 2: comet | 27: mandelbrot_zoom | 52: print_offset |
| 3: cornell_box | 28: marching_squares | 53: rasterizer |
| 4: diff_sph | 29: mass_spring_3d_ggui | 54: regression |
| 5: euler | 30: mass_spring_game | 55: sdf_renderer |
| 6: explicit_activation | 31: | 56: simple_derivative |
| | mass_spring_game_ggui | |
| 7: export_mesh | 32: mciso_advanced | 57: simple_texture |
| 8: export_ply | 33: mgpcg | 58: simple_uv |
| 9: export_videos | 34: mgpcg_advanced | 59: snow_phaseField |
| 10: fem128 | 35: minimal | 60: stable_fluid |
| 11: fem128_ggui | 36: minimization | 61: stable_fluid_ggui |
| 12: fem99 | 37: mpm128 | 62: stable_fluid_graph |
| 13: fractal | 38: mpm128_ggui | 63: taichi_bitmasked |
| 14: fractal3d_ggui | 39: mpm3d | 64: taichi_dynamic |
| 15: fullscreen | 40: mpm3d_ggui | 65: taichi_logo |
| 16: game_of_life | 41: mpm88 | 66: taichi_ngp |
| 17: gui_image_io | 42: mpm88_graph | 67: taichi_sparse |
| 18: gui_widgets | 43: mpm99 | 68: texture_graph |
| 19: implicit_fem | 44: | 69: tutorial |
| | mpm_lagrangian_forces | |
| 20: | 45: nbody | 70: |
| implicit_mass_spring | | two_stream_instability |
| 21: | 46: odop_solar | 71: vortex_rings |
| initial_value_problem | | |
| 22: jacobian | 47: oit_renderer | 72: waterwave |
| 23: | 48: patterns | |
| karman_vortex_street | | |
| 24: keyboard | 49: pbf2d | |
+-----------------------------------------------------------------------------+
42
Running example minimal ...
[Taichi] Starting on arch=x64
42.0
>>> Running time: 0.22s
Consider attaching this log when maintainers ask about system information.
>>> Running time: 6.75s
-3 means error not supported, so we accidentally used a unsupported / not activated feature I think...
We will look into it
We have located the bug, it appears to be an issue with the maxComputeWorkGroupCount
values on AMD GPUs. We should be able to fix this quickly
Any updates for this issue? This seems to have affected Intel cards as well.
Here is a minimal script to reproduce the issue:
import numpy as np
import taichi as ti
ti.init(arch=ti.gpu)
np_arr = np.ones((100000000,), dtype=np.float32)
ti_field = ti.field(dtype=ti.f32, shape=(np_arr.shape[0],))
@ti.kernel
def run(dst: ti.template(), src: ti.types.ndarray()):
for I in dst:
dst[I] = src[I]
run(ti_field, np_arr)
Once the length of np_arr
exceeds a point, we get RhiResult(-3)
for the vulkan backend.
If this issue is low priority on your list, could you please advise me how I may contribute a PR for the issue? @bobcao3
Thanks
No PR, sorry, but if somebody wants to pick this up (e.g. as part of PR #7333), the following patch fixes this for me (linux, mesa radv vulkan, amd). The problem is in the const-range case of spir-v range-for codegen, which can currently ask for an unbounded number of workgroups. This patch just applies a fixed cap that matches the dynamic case. I'm assuming the performance implications don't matter, but you may know better.
diff --git a/taichi/codegen/spirv/spirv_codegen.cpp b/taichi/codegen/spirv/spirv_codegen.cpp
index e1e1124fd..b9167f6e0 100644
--- a/taichi/codegen/spirv/spirv_codegen.cpp
+++ b/taichi/codegen/spirv/spirv_codegen.cpp
@@ -2000,7 +2000,17 @@ class TaskCodegen : public IRVisitor {
ir_->i32_type(), stmt->begin_value, false); // Named Constant
total_elems = ir_->int_immediate_number(ir_->i32_type(), num_elems,
false); // Named Constant
- task_attribs_.advisory_total_num_threads = num_elems;
+ // To avoid exceeding device limits, we must cap total_num_threads so
+ // that the eventual num_workgroups = total_num_threads/block_dim is in
+ // range. Use the same kMaxNumThreadsGridStrideLoop cap as the dynamic
+ // case; that's probably a bit conservative for typical (num_elems,
+ // block_dim) combinations, but there's little to be gained by adapting
+ // to the actual limit even if that were readily available.
+ task_attribs_.advisory_total_num_threads = std::min(
+ kMaxNumThreadsGridStrideLoop, num_elems);
+ TI_DEBUG("num_elems={} block_dim={} -> advisory_total_num_threads={}",
+ num_elems, stmt->block_dim,
+ task_attribs_.advisory_total_num_threads);
} else {
spirv::Value end_expr_value;
if (stmt->end_stmt) {