ai-toolkit icon indicating copy to clipboard operation
ai-toolkit copied to clipboard

CUDA error: no kernel image is available for execution on the device CUDA kernel errors

Open narikm opened this issue 7 months ago • 1 comments

This is for bugs only

Did you already ask in the discord?

No discord

You verified that this is a bug and not a feature request or question by asking in the discord?

No discord

Describe the bug

The software doesn't seems to work with Blackwell cards, Is there a fix?

#############################################

Running job: my_first_lora_v1

#############################################

Running 1 process Loading Flux model Loading transformer Loading checkpoint shards: 100%|###########################################################| 2/2 [00:00<00:00, 75.81it/s] Quantizing transformer Failed to quantize time_text_embed.timestep_embedder.linear_1: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Error running job: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

======================================== Result:

  • 0 completed jobs
  • 1 failure ======================================== Traceback (most recent call last): File "G:\SD\ai-toolkit\run.py", line 119, in main() File "G:\SD\ai-toolkit\run.py", line 107, in main raise e File "G:\SD\ai-toolkit\run.py", line 95, in main job.run() File "G:\SD\ai-toolkit\jobs\ExtensionJob.py", line 22, in run process.run() File "G:\SD\ai-toolkit\jobs\process\BaseSDTrainProcess.py", line 1471, in run self.sd.load_model() File "G:\SD\ai-toolkit\toolkit\stable_diffusion_model.py", line 749, in load_model quantize(transformer, weights=quantization_type, **self.model_config.quantize_kwargs) File "G:\SD\ai-toolkit\toolkit\util\quantize.py", line 98, in quantize raise e File "G:\SD\ai-toolkit\toolkit\util\quantize.py", line 94, in quantize quantize_submodule(model, name, m, weights=weights, File "G:\SD\ai-toolkit\venv\lib\site-packages\optimum\quanto\quantize.py", line 45, in quantize_submodule qmodule = quantize_module(module, weights=weights, activations=activations, optimizer=optimizer) File "G:\SD\ai-toolkit\venv\lib\site-packages\optimum\quanto\nn\qmodule.py", line 86, in quantize_module return qcls.from_module(module, weights=weights, activations=activations, optimizer=optimizer) File "G:\SD\ai-toolkit\venv\lib\site-packages\optimum\quanto\nn\qmodule.py", line 202, in from_module qmodule = cls.qcreate(module, weights, activations, optimizer) File "G:\SD\ai-toolkit\venv\lib\site-packages\optimum\quanto\nn\qlinear.py", line 32, in qcreate return cls( File "G:\SD\ai-toolkit\venv\lib\site-packages\optimum\quanto\nn\qmodule.py", line 109, in init super().init(*args, **kwargs) File "G:\SD\ai-toolkit\venv\lib\site-packages\torch\nn\modules\linear.py", line 112, in init self.reset_parameters() File "G:\SD\ai-toolkit\venv\lib\site-packages\torch\nn\modules\linear.py", line 118, in reset_parameters init.kaiming_uniform(self.weight, a=math.sqrt(5)) File "G:\SD\ai-toolkit\venv\lib\site-packages\torch\nn\init.py", line 518, in kaiming_uniform return tensor.uniform_(-bound, bound, generator=generator) RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

narikm avatar May 17 '25 02:05 narikm

Replace the pip install --no-cache-dir torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu126 line with pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128

narikm avatar May 22 '25 20:05 narikm

pip

Which file contains the line that we replace with the other line? Thanks!

kilerb avatar Jun 04 '25 22:06 kilerb

Which file contains the line that we replace with the other line? Thanks!

Not a file, the installation process:

git clone https://github.com/ostris/ai-toolkit.git cd ai-toolkit python -m venv venv .\venv\Scripts\activate pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128 pip install -r requirements.txt

narikm avatar Jun 12 '25 12:06 narikm

Im having the same problem on WSL: RuntimeError: CUDA error: the launch timed out and was terminated

CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

3090 RTX. Should I install this-> pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128 ??

davizca avatar Jul 23 '25 00:07 davizca