GPTQ-for-LLaMa "CUDA Error: No kernel image is available"

My configuration is as follows:

Arch linux, fully up to date, nvidia drivers installed and configured correctly, cuda installed and configured correctly, the works
Podman image build using a customized version of this script: https://github.com/RedTopper/Text-Generation-Webui-Podman (Containerfile edited to use the latest commit on the CUDA branch, and set TORCH_CUDA_ARCH_LIST="All" when compiling).
GPU: 1080Ti

I get the following output when running the benchmark:

Benchmarking LLaMa-7B FC2 matvec ...
FP16: 0.0007498373985290527
2bit: 2.6212453842163085e-05
Traceback (most recent call last):
  File "/app/repositories/GPTQ-for-LLaMa/test_kernel.py", line 51, in <module>
    mat = torch.randint(-1000000000, 1000000000, (M // 32 * 3, N), device=DEV, dtype=torch.int)
RuntimeError: CUDA error: no kernel image is available for execution on the device
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I get a similar output (obviously with a different stack trace) when trying to run inference on the model. Everything loads correctly, the error only happens when something is evaluated.

Apr 09 '23 22:04 Yona-W

I got the same thing today with an NVIDIA A100. Did you ever figure it out?

Apr 21 '23 19:04 mxbi

Ah, I totally forgot I had opened an issue.

For my situation, I figured out that Torch 2.0 has problems specifically with the 1080Ti, and modifying the Containerfile to use Torch < 2.0 solved my issues.

With an A100 though, I'm not sure what could be causing it. I doubt Torch would have issues with pretty much the most popular ML card. If you're using the same Containerfile, I guess make sure the the correct CUDA architecture is listed in the defines?

Apr 22 '23 01:04 Yona-W

I'm also using podman (on Fedora 38) and running into this as well, I also have an issue filed @ https://github.com/oobabooga/text-generation-webui/issues/2002

Thanks for the link to RedTopper/Text-Generation-Webui-Podman, I hadn't been using that.

The version of GPTQ for LLaMa used in the text-generation-webui is a fork of this repo @ https://github.com/oobabooga/GPTQ-for-LLaMa but I came looking

In my case, I ran into the error here, with:

  File "/app/repositories/GPTQ-for-LLaMa/quant.py", line 431, in forward
    y = y.to(output_dtype)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

(Full stack trace in linked issue)

May 12 '23 11:05 dougbtv

GPTQ-for-LLaMa GPTQ-for-LLaMa copied to clipboard

"CUDA Error: No kernel image is available"

GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard