diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Sage Attention sm90 causes confetti/noisy output

Open dylanprins opened this issue 3 weeks ago • 0 comments

Describe the bug

When using _sage_qk_int8_pv_fp8_cuda_sm90 as the attention backend on WAN2.2 I2V I notice that the output is broken:

Image

It works fine with _flash_3_hub and _sage_qk_int8_pv_fp16_cuda

Does it matter whether we use the original fp16 model? Do we need the fp8 model or should this just work out of the box?

Can somebody explain the type of artifacts we have? Does it point to a specific issue?

Reproduction

import torch
import numpy as np
from diffusers import WanPipeline, AutoencoderKLWan
from diffusers.utils import export_to_video, load_image

dtype = torch.bfloat16
device = "cuda"
vae = AutoencoderKLWan.from_pretrained("Wan-AI/Wan2.2-T2V-A14B-Diffusers", subfolder="vae", torch_dtype=torch.float32)
pipe = WanPipeline.from_pretrained("Wan-AI/Wan2.2-T2V-A14B-Diffusers", vae=vae, torch_dtype=dtype)
pipe.transformer.set_attention_backend("_sage_qk_int8_pv_fp8_cuda_sm90")

pipe.to(device)

height = 720
width = 1280

prompt = "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
negative_prompt = "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走"
output = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=height,
    width=width,
    num_frames=81,
    guidance_scale=4.0,
    guidance_scale_2=3.0,
    num_inference_steps=40,
).frames[0]
export_to_video(output, "t2v_out.mp4", fps=16)

Logs


System Info

diffusers = 0.36.0.dev0 python = 3.11 cuda = 12.8 nvidia driver = 570.195.03

Using H100 80GB HBM3

Who can help?

@DN6 @yiyixuxu

dylanprins avatar Dec 03 '25 15:12 dylanprins