CUDA error: no kernel image is available for execution on the device CUDA kernel errors
This is for bugs only
Did you already ask in the discord?
No discord
You verified that this is a bug and not a feature request or question by asking in the discord?
No discord
Describe the bug
The software doesn't seems to work with Blackwell cards, Is there a fix?
#############################################
Running job: my_first_lora_v1
#############################################
Running 1 process
Loading Flux model
Loading transformer
Loading checkpoint shards: 100%|###########################################################| 2/2 [00:00<00:00, 75.81it/s]
Quantizing transformer
Failed to quantize time_text_embed.timestep_embedder.linear_1: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
Error running job: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
======================================== Result:
- 0 completed jobs
- 1 failure
========================================
Traceback (most recent call last):
File "G:\SD\ai-toolkit\run.py", line 119, in
main() File "G:\SD\ai-toolkit\run.py", line 107, in main raise e File "G:\SD\ai-toolkit\run.py", line 95, in main job.run() File "G:\SD\ai-toolkit\jobs\ExtensionJob.py", line 22, in run process.run() File "G:\SD\ai-toolkit\jobs\process\BaseSDTrainProcess.py", line 1471, in run self.sd.load_model() File "G:\SD\ai-toolkit\toolkit\stable_diffusion_model.py", line 749, in load_model quantize(transformer, weights=quantization_type, **self.model_config.quantize_kwargs) File "G:\SD\ai-toolkit\toolkit\util\quantize.py", line 98, in quantize raise e File "G:\SD\ai-toolkit\toolkit\util\quantize.py", line 94, in quantize quantize_submodule(model, name, m, weights=weights, File "G:\SD\ai-toolkit\venv\lib\site-packages\optimum\quanto\quantize.py", line 45, in quantize_submodule qmodule = quantize_module(module, weights=weights, activations=activations, optimizer=optimizer) File "G:\SD\ai-toolkit\venv\lib\site-packages\optimum\quanto\nn\qmodule.py", line 86, in quantize_module return qcls.from_module(module, weights=weights, activations=activations, optimizer=optimizer) File "G:\SD\ai-toolkit\venv\lib\site-packages\optimum\quanto\nn\qmodule.py", line 202, in from_module qmodule = cls.qcreate(module, weights, activations, optimizer) File "G:\SD\ai-toolkit\venv\lib\site-packages\optimum\quanto\nn\qlinear.py", line 32, in qcreate return cls( File "G:\SD\ai-toolkit\venv\lib\site-packages\optimum\quanto\nn\qmodule.py", line 109, in init super().init(*args, **kwargs) File "G:\SD\ai-toolkit\venv\lib\site-packages\torch\nn\modules\linear.py", line 112, in init self.reset_parameters() File "G:\SD\ai-toolkit\venv\lib\site-packages\torch\nn\modules\linear.py", line 118, in reset_parameters init.kaiming_uniform(self.weight, a=math.sqrt(5)) File "G:\SD\ai-toolkit\venv\lib\site-packages\torch\nn\init.py", line 518, in kaiming_uniform return tensor.uniform_(-bound, bound, generator=generator) RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSAto enable device-side assertions.
Replace the pip install --no-cache-dir torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu126 line with pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
pip
Which file contains the line that we replace with the other line? Thanks!
Which file contains the line that we replace with the other line? Thanks!
Not a file, the installation process:
git clone https://github.com/ostris/ai-toolkit.git cd ai-toolkit python -m venv venv .\venv\Scripts\activate pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128 pip install -r requirements.txt
Im having the same problem on WSL: RuntimeError: CUDA error: the launch timed out and was terminated
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
3090 RTX. Should I install this-> pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128 ??