Joe Gaffney comments

Results 20 comments of


                                            Joe Gaffney

Implement Selective Attention for Memory-Efficient Inference

Would be great having more ways to reduce memory in video generation aside from the usual quantize and offloading (which do help a ton). WAN in particular uses a ton...

cannot import name 'export_to_video' from 'diffusers

I think it used to work with ``` python from diffusers import export_to_video ``` But at some point my IDE linter was complaining so switched to importing from diffusers.utils

Fix `QwenImageEditPlusPipeline` not using the passed in width/height for the VAE

I noticed this issue with 16by 9 ratio. Glad you found what it was

Fix `QwenImageEditPlusPipeline` not using the passed in width/height for the VAE

Hey, I got some pretty good results doing. ``` python def image_edit_call(context: ImageContext): # see https://github.com/huggingface/diffusers/pull/12453/files import diffusers.pipelines.qwenimage.pipeline_qwenimage_edit_plus as qwen_edit_module qwen_edit_module.VAE_IMAGE_SIZE = context.width * context.height # gather all possible reference...

[Wan] Optimize time & memory

Hey, be interesting to know the rough before and after metrics? Be great if this does reduce memory as wan really shoots up in memory with resolution and time increases.

Can diffusers support loading and running FLUX with fp8 ?

> The time it takes to load the model into ram and vram varies, sometimes it is 12 seconds + 22 seconds, sometimes it is 21 seconds + 49 seconds....

how to quantization wan 2.2 vace after loading lora?

I think you need to fuse the loras then save. Possibly you only need to save the two transformers as well

Z-Image Turbo Controlnet Union is out

> When they release their Edit version, the internet will go down in flames. This model is seriously the definition of a game-changer. Possibly the edit model would mostly negate...

Any Wan version crashes

maybe the flow shift. ``` python pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=3.0) ``` I know this fused lighting model works when quantized and offloaded. But i only quant the text encoder to...

Not all memory released in video predictions

> This is also happening on image segmentation,extra vram(about 2g in my case) not released even after unloading model and finished processing. Interesting I didn't test the image one to...