fast_rnnt RuntimeError: Failed to find native CUDA module

RuntimeError: Failed to find native CUDA module, make sure that you compiled the code with K2_WITH_CUDA.

Jan 11 '24 10:01 scutcsq

Could you describe how you installed fast_rnnt?

Jan 11 '24 13:01 csukuangfj

Could you describe how you installed fast_rnnt?

I used pip to install fast_rnnt. Now I have installed the k2 and the problem is solved by using the function in k2.

Jan 11 '24 14:01 scutcsq

Hi, we had the same error after the successful building fast_rnnt for AMD using Rocm 5.4 with correct installed pytorch 2.0.1 and torchaudio 0.15.2

File "/home/ubnt/anaconda3/lib/python3.8/site-packages/fast_rnnt/rnnt_loss.py", line 533, in rnnt_loss
    scores_and_grads = mutual_information_recursion(
  File "/home/ubnt/anaconda3/lib/python3.8/site-packages/fast_rnnt/mutual_information.py", line 294, in mutual_information_recursion
    scores = MutualInformationRecursionFunction.apply(
  File "/home/ubnt/anaconda3/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/ubnt/anaconda3/lib/python3.8/site-packages/fast_rnnt/mutual_information.py", line 157, in forward
    ans = _fast_rnnt.mutual_information_forward(px, py, boundary, p)
RuntimeError: Failed to find native CUDA module, make sure that you compiled the code with K2_WITH_CUDA.

We want to use only fast_rnnt without k2. We installed it via build from source

git clone https://github.com/danpovey/fast_rnnt.git
cd fast_rnnt
export FT_MAKE_ARGS="-j32"
pip install --verbose fast_rnnt

Jan 13 '24 18:01 bene-ges

It seems that Rocm isn't supported in the build. -- No NVCC detected. Disable CUDA support

Jan 13 '24 19:01 bene-ges

@bene-ges Basically if pytorch can run on Rocm, fast_rnnt can also run on it. Will have a look at this issue. Thanks!

Jan 19 '24 10:01 pkufool

But the core of fast_rnnt is the CUDA code, no? And I believe Rocm does not use cuda? So would require rewrite to support that??

Jan 20 '24 11:01 danpovey

@danpovey, rocm can compile CUDA code into the amd binary. Most of projects just add the rocm compile commands like Pytorch does. So the Pytorch build system can be an example of right solution Docs

Example of conversion of CUDA code to ROCm code and its compilation (matrix-cuda is just example of cuda code) on ubuntu: git clone https://github.com/lzhengchun/matrix-cuda cd matrix-cuda /opt/rocm-5.3.0/bin/hipify-clang matrix_cuda.cu After this a file matrix_cuda.cu.hip will appear which is source code for ROCm. Then it can be compiled with HIPCC /opt/rocm-5.3.0/bin/hipсс matrix_cuda.cu.hip After this file a.out will appear

Jan 20 '24 13:01 bene-ges

another useful link on porting CUDA (all notations almost identical) https://www.lumi-supercomputer.eu/preparing-codes-for-lumi-converting-cuda-applications-to-hip/

Jan 20 '24 14:01 bene-ges

I can help with testing on amd if needed

Feb 16 '24 15:02 bene-ges

OK that's interesting. If it's possible for you to add support for ROCM into our build system (which is I think not entirely trivial), then I think we'd appreciate that very much. This kind of thing will no doubt be used more frequently in the future. (Also: apologies for the very late response.)

Feb 18 '24 06:02 danpovey