Kandinsky5TimeEmbeddings hardcodes 'cuda' in @torch.autocast decorator, causing warning on non-CUDA systems
Body:
Describe the bug
When importing diffusers on a non-CUDA system (e.g., Apple Silicon Mac with MPS), a warning is
emitted:
/torch/amp/autocast_mode.py:270: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling warnings.warn(
This occurs because Kandinsky5TimeEmbeddings class has a hardcoded "cuda" device type in the
@torch.autocast decorator.
Location
File: diffusers/models/transformers/transformer_kandinsky.py
Line: 168
@torch.autocast(device_type="cuda", dtype=torch.float32)
def forward(self, timestep):
Root Cause
The decorator is evaluated at import time, not at runtime. On systems without CUDA (like Apple
Silicon Macs using MPS), this triggers the warning even though the Kandinsky model may never be
used.
Reproduction
# On a Mac with Apple Silicon (no CUDA)
from diffusers import ZImagePipeline # or any pipeline
# Warning appears immediately on import
Expected behavior
No warning should appear when importing diffusers on non-CUDA systems.
Environment
- OS: macOS (Apple Silicon M-series)
- Python: 3.13
- PyTorch: 2.x (MPS backend)
- Diffusers: latest
I was able to reproduce the above UserWarning on a non-CUDA setup.
In addition to Kandinsky5TimeEmbeddings, I noticed the Kandinsky5Modulation class also uses the same decorator pattern with a hardcoded device check, contributing to the issue.
Looking at the comment here by @leffff, the intent was to force these operations to run in float32 to prevent precision loss and NaNs during mixed-precision training.
We can remove the @torch.autocast decorator from both classes and explicitly cast the inputs to float32 inside their forward methods (e.g., time = time.to(dtype=torch.float32)). I verified locally that this removes the warning.
Does this sound good? @leffff @yiyixuxu
btw, this is not just a warning, this causes a failure down the road on torch-xpu:
│D:\sdnext\venv\Lib\site-packages\diffusers\models\transformers\transformer_kandinsky.py:172 in forward │
│ │
│ 171 │ │ time_embed = torch.cat([torch.cos(args), torch.sin(args)], dim=-1) │
│❱ 172 │ │ time_embed = self.out_layer(self.activation(self.in_layer(time_embed))) │
│ 173 │ │ return time_embed
RuntimeError: mat1 and mat2 must have the same dtype, but got Float and BFloat16
Hi! I See the problem. If you have fixes you want to propose please create a pull request. However, does Flex Attention work fine on non Cuda systems?
Hi! I See the problem. If you have fixes you want to propose please create a pull request. However, does Flex Attention work fine on non Cuda systems?
I have gone through the Pytorch source. According to it, Flex Attention is supported on CPUs only if they have AVX2 instruction set support and are not running on macOS.
Thanks for pointing that out @vladmandic.
To fix this, we should upcast the input to fp32, compute the embeddings, and downcast the result of the math back to weight.dtype before passing to the Linear layer.
I'll open a PR for this.
When you open the PR, plz tag me and provide examples Of before/after generations from the same noise,so we make sure the results are stable.