taichi icon indicating copy to clipboard operation
taichi copied to clipboard

RhiResult(-3) error when running NGP example on Windows with Vulkan

Open arrrmin opened this issue 1 year ago • 5 comments

Describe the bug I get a Dispatch error : RhiResult(-3) when trying to run taichi_ngp.py on Windows using Vulkan.

To Reproduce Tried running the script taichi_ngp.py on Windows 11 with python 3.10.9 and taichi 1.5.0. CPU: Ryzen 9 7900X GPU: AMD 7900XT

Log/Screenshots

(taichi) C:\Users\armin\projects\external\taichi>C:/Users/armin/Programs/miniconda3/envs/taichi/python.exe c:/Users/armin/projects/external/taichi/python/taichi/examples/rendering/taichi_ngp.py
[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9
[W 03/28/23 14:37:12.283 28600] [cuda_driver.cpp:taichi::lang::CUDADriverBase::load_lib@36] nvcuda.dll lib not found.
[Taichi] Starting on arch=vulkan
Loading model from ./npy_models/lego.npy
[E 03/28/23 14:37:12.541 28600] [runtime.cpp:taichi::lang::gfx::GfxRuntime::launch_kernel@576] Dispatch error : RhiResult(-3)


Traceback (most recent call last):
  File "c:\Users\armin\projects\external\taichi\python\taichi\examples\rendering\taichi_ngp.py", line 1111, in <module>
    main(cmd_args)
  File "c:\Users\armin\projects\external\taichi\python\taichi\examples\rendering\taichi_ngp.py", line 1081, in main
    ngp.load_model(npy_file)
  File "c:\Users\armin\projects\external\taichi\python\taichi\examples\rendering\taichi_ngp.py", line 332, in load_model
    self.hash_embedding.from_numpy(
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\util.py", line 311, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\field.py", line 357, in from_numpy
    self._from_external_arr(arr)
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\util.py", line 311, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\field.py", line 347, in _from_external_arr
    ext_arr_to_tensor(arr, self)
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\kernel_impl.py", line 1023, in wrapped
    return primal(*args, **kwargs)
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\kernel_impl.py", line 950, in __call__
    return self.runtime.compiled_functions[key](*args)
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\kernel_impl.py", line 853, in func__
    raise e from None
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\kernel_impl.py", line 850, in func__
    t_kernel(launch_ctx)
RuntimeError: [runtime.cpp:taichi::lang::gfx::GfxRuntime::launch_kernel@576] Dispatch error : RhiResult(-3)

Additional comments

(taichi) C:\Users\armin\projects\external\taichi>ti diagnose
[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9

*******************************************
**      Taichi Programming Language      **
*******************************************

Docs:   https://docs.taichi-lang.org/
GitHub: https://github.com/taichi-dev/taichi/
Forum:  https://forum.taichi.graphics/

Taichi system diagnose:

python: 3.10.9 | packaged by conda-forge | (main, Jan 11 2023, 15:15:40) [MSC v.1916 64 bit (AMD64)]
system: win32
executable: C:\Users\armin\Programs\miniconda3\envs\taichi\python.exe
platform: Windows-10-10.0.22621-SP0
architecture: 64bit WindowsPE
uname: uname_result(system='Windows', node='Armin-PC', release='10', version='10.0.22621', machine='AMD64')
locale: en_AT.cp1252
PATH: C:\Users\armin\Programs\miniconda3\envs\taichi;C:\Users\armin\Programs\miniconda3\envs\taichi\Library\mingw-w64\bin;C:\Users\armin\Programs\miniconda3\envs\taichi\Library\usr\bin;C:\Users\armin\Programs\miniconda3\envs\taichi\Library\bin;C:\Users\armin\Programs\miniconda3\envs\taichi\Scripts;C:\Users\armin\Programs\miniconda3\envs\taichi\bin;C:\Users\armin\Programs\miniconda3\condabin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Windows\System32\OpenSSH;C:\Program Files\Git\cmd;C:\Users\armin\AppData\Local\Microsoft\WindowsApps;C:\Users\armin\AppData\Local\gitkraken\bin;C:\Users\armin\AppData\Local\Programs\Microsoft VS Code\bin;C:\Users\armin\Programs\miniconda3;C:\Users\armin\Programs\miniconda3\Scripts;C:\Users\armin\Programs\miniconda3\Library\bin;.;C:\Users\armin\AppData\Local\Programs\Julia-1.8.5\bin;C:\Users\armin\Programs\miniconda3\envs\taichi\Lib\site-packages\taichi\_lib
PYTHONPATH: ['C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\Scripts\\ti.exe', 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\python310.zip', 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\DLLs', 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\lib', 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi', 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\lib\\site-packages']

`lsb_release` not available: [WinError 2] The system cannot find the file specified


import: <module 'taichi' from 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\lib\\site-packages\\taichi\\__init__.py'>

cc: False
cpu: True
metal: False
opengl: True
[W 03/28/23 14:28:12.986 29280] [cuda_driver.cpp:taichi::lang::CUDADriverBase::load_lib@36] nvcuda.dll lib not found.
cuda: False
vulkan: True

`glewinfo` not available: [WinError 2] The system cannot find the file specified

`nvidia-smi` not available: [WinError 2] The system cannot find the file specified
[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9

[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9
[Taichi] Starting on arch=x64

[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9
[Taichi] Starting on arch=opengl

[W 03/28/23 14:28:15.777 32208] [cuda_driver.cpp:taichi::lang::CUDADriverBase::load_lib@36] nvcuda.dll lib not found.
[W 03/28/23 14:28:15.778 32208] [misc.py:adaptive_arch_select@772] Arch=[<Arch.cuda: 5>] is not supported, falling back to CPU
[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9
[Taichi] Starting on arch=x64

[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9

*******************************************
**      Taichi Programming Language      **
*******************************************

Docs:   https://docs.taichi-lang.org/
GitHub: https://github.com/taichi-dev/taichi/
Forum:  https://forum.taichi.graphics/

                                TAICHI EXAMPLES
+-----------------------------------------------------------------------------+
| 0: ad_gravity           | 25: laplace             | 50: physarum            |
| 1: circle_packing_image | 26: laplace_equation    | 51:                     |
|                         |                         | poisson_disk_sampling   |
| 2: comet                | 27: mandelbrot_zoom     | 52: print_offset        |
| 3: cornell_box          | 28: marching_squares    | 53: rasterizer          |
| 4: diff_sph             | 29: mass_spring_3d_ggui | 54: regression          |
| 5: euler                | 30: mass_spring_game    | 55: sdf_renderer        |
| 6: explicit_activation  | 31:                     | 56: simple_derivative   |
|                         | mass_spring_game_ggui   |                         |
| 7: export_mesh          | 32: mciso_advanced      | 57: simple_texture      |
| 8: export_ply           | 33: mgpcg               | 58: simple_uv           |
| 9: export_videos        | 34: mgpcg_advanced      | 59: snow_phaseField     |
| 10: fem128              | 35: minimal             | 60: stable_fluid        |
| 11: fem128_ggui         | 36: minimization        | 61: stable_fluid_ggui   |
| 12: fem99               | 37: mpm128              | 62: stable_fluid_graph  |
| 13: fractal             | 38: mpm128_ggui         | 63: taichi_bitmasked    |
| 14: fractal3d_ggui      | 39: mpm3d               | 64: taichi_dynamic      |
| 15: fullscreen          | 40: mpm3d_ggui          | 65: taichi_logo         |
| 16: game_of_life        | 41: mpm88               | 66: taichi_ngp          |
| 17: gui_image_io        | 42: mpm88_graph         | 67: taichi_sparse       |
| 18: gui_widgets         | 43: mpm99               | 68: texture_graph       |
| 19: implicit_fem        | 44:                     | 69: tutorial            |
|                         | mpm_lagrangian_forces   |                         |
| 20:                     | 45: nbody               | 70:                     |
| implicit_mass_spring    |                         | two_stream_instability  |
| 21:                     | 46: odop_solar          | 71: vortex_rings        |
| initial_value_problem   |                         |                         |
| 22: jacobian            | 47: oit_renderer        | 72: waterwave           |
| 23:                     | 48: patterns            |                         |
| karman_vortex_street    |                         |                         |
| 24: keyboard            | 49: pbf2d               |                         |
+-----------------------------------------------------------------------------+
42
Running example minimal ...
[Taichi] Starting on arch=x64
42.0
>>> Running time: 0.22s

Consider attaching this log when maintainers ask about system information.
>>> Running time: 6.75s

arrrmin avatar Mar 28 '23 12:03 arrrmin

-3 means error not supported, so we accidentally used a unsupported / not activated feature I think...

We will look into it

bobcao3 avatar Mar 28 '23 16:03 bobcao3

We have located the bug, it appears to be an issue with the maxComputeWorkGroupCount values on AMD GPUs. We should be able to fix this quickly

bobcao3 avatar Mar 30 '23 03:03 bobcao3

Any updates for this issue? This seems to have affected Intel cards as well.

chenzhekl avatar Sep 10 '23 14:09 chenzhekl

Here is a minimal script to reproduce the issue:

import numpy as np
import taichi as ti

ti.init(arch=ti.gpu)


np_arr = np.ones((100000000,), dtype=np.float32)
ti_field = ti.field(dtype=ti.f32, shape=(np_arr.shape[0],))


@ti.kernel
def run(dst: ti.template(), src: ti.types.ndarray()):
    for I in dst:
        dst[I] = src[I]


run(ti_field, np_arr)

Once the length of np_arr exceeds a point, we get RhiResult(-3) for the vulkan backend.

If this issue is low priority on your list, could you please advise me how I may contribute a PR for the issue? @bobcao3

Thanks

chenzhekl avatar Sep 15 '23 13:09 chenzhekl

No PR, sorry, but if somebody wants to pick this up (e.g. as part of PR #7333), the following patch fixes this for me (linux, mesa radv vulkan, amd). The problem is in the const-range case of spir-v range-for codegen, which can currently ask for an unbounded number of workgroups. This patch just applies a fixed cap that matches the dynamic case. I'm assuming the performance implications don't matter, but you may know better.

diff --git a/taichi/codegen/spirv/spirv_codegen.cpp b/taichi/codegen/spirv/spirv_codegen.cpp
index e1e1124fd..b9167f6e0 100644
--- a/taichi/codegen/spirv/spirv_codegen.cpp
+++ b/taichi/codegen/spirv/spirv_codegen.cpp
@@ -2000,7 +2000,17 @@ class TaskCodegen : public IRVisitor {
           ir_->i32_type(), stmt->begin_value, false);  // Named Constant
       total_elems = ir_->int_immediate_number(ir_->i32_type(), num_elems,
                                               false);  // Named Constant
-      task_attribs_.advisory_total_num_threads = num_elems;
+      // To avoid exceeding device limits, we must cap total_num_threads so
+      // that the eventual num_workgroups = total_num_threads/block_dim is in
+      // range.  Use the same kMaxNumThreadsGridStrideLoop cap as the dynamic
+      // case; that's probably a bit conservative for typical (num_elems,
+      // block_dim) combinations, but there's little to be gained by adapting
+      // to the actual limit even if that were readily available.
+      task_attribs_.advisory_total_num_threads = std::min(
+          kMaxNumThreadsGridStrideLoop, num_elems);
+      TI_DEBUG("num_elems={} block_dim={} -> advisory_total_num_threads={}",
+               num_elems, stmt->block_dim,
+               task_attribs_.advisory_total_num_threads);
     } else {
       spirv::Value end_expr_value;
       if (stmt->end_stmt) {

dme49 avatar Jun 20 '24 10:06 dme49