nunchaku icon indicating copy to clipboard operation
nunchaku copied to clipboard

会不会针对FLUX2.进行模型的释放呢?

Open Code-dogcreatior opened this issue 1 month ago • 10 comments

Checklist

  • [ ] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/mit-han-lab/nunchaku/discussions/new/choose. Otherwise, it will be closed.
  • [ ] 2. I will do my best to describe the issue in English.

Motivation

如title所示

Related resources

No response

Code-dogcreatior avatar Nov 26 '25 01:11 Code-dogcreatior

+1

tonera avatar Nov 26 '25 15:11 tonera

+1,等flux2和z-image的支持

keven1024 avatar Nov 27 '25 09:11 keven1024

Z-image is much better than Flux2, so we prioritize adapting to it and can enjoy sub second generation

liuguicen avatar Nov 28 '25 10:11 liuguicen

1

oyjh1026-prog avatar Nov 28 '25 13:11 oyjh1026-prog

Z-image is much better than Flux2, so we prioritize adapting to it and can enjoy sub second generation

I don't believe one is better than the other, but Flux2 has a much better understanding of subjects and different ways people and objects can look. People made in z-image look too perfect. Both look great, but getting flux2 smaller makes more sense as z-image is already small and fast.

CeciliaXCIX avatar Nov 28 '25 22:11 CeciliaXCIX

I agree. Also, focusing on a turbo distilled model with relatively small paramaters seems odd, it's already incredibly fast and runs on midrange consumer cards.

Flux 2 FP4/INT4 would theoretically allow for zero/minimal offloading on 24GB cards like the 3090/4090.

Zorgonatis avatar Nov 29 '25 18:11 Zorgonatis

Z-image is much better than Flux2, so we prioritize adapting to it and can enjoy sub second generation

I don't believe one is better than the other, but Flux2 has a much better understanding of subjects and different ways people and objects can look. People made in z-image look too perfect. Both look great, but getting flux2 smaller makes more sense as z-image is already small and fast.

cant agree more.

cvtower avatar Nov 30 '25 12:11 cvtower

I agree. Also, focusing on a turbo distilled model with relatively small paramaters seems odd, it's already incredibly fast and runs on midrange consumer cards.

Flux 2 FP4/INT4 would theoretically allow for zero/minimal offloading on 24GB cards like the 3090/4090.

You still need to offload even at 4bit with both the text-encoder and transformer. Takes it over 32gb from my testing. Still have to pipe.enable_model_cpu_offload() this is just with bitsandbytes though. Which is not as good as nunchaku.

You could compute prompt embeds separate though as this comes in at around 16gb for the transfomer at 4bit.

IMO the text_encoder is bit ridiculous for flux-2,.dev in terms of the results given

JoeGaffney avatar Nov 30 '25 12:11 JoeGaffney

I agree. Also, focusing on a turbo distilled model with relatively small paramaters seems odd, it's already incredibly fast and runs on midrange consumer cards. Flux 2 FP4/INT4 would theoretically allow for zero/minimal offloading on 24GB cards like the 3090/4090.

You still need to offload even at 4bit with both the text-encoder and transformer. Takes it over 32gb from my testing. Still have to pipe.enable_model_cpu_offload() this is just with bitsandbytes though. Which is not as good as nunchaku.

You could compute prompt embeds separate though as this comes in at around 16gb for the transfomer at 4bit.

IMO the text_encoder is bit ridiculous for flux-2,.dev in terms of the results given

Is this with using the fp8 or fp16 text encoder?

CeciliaXCIX avatar Nov 30 '25 18:11 CeciliaXCIX

I agree. Also, focusing on a turbo distilled model with relatively small paramaters seems odd, it's already incredibly fast and runs on midrange consumer cards. Flux 2 FP4/INT4 would theoretically allow for zero/minimal offloading on 24GB cards like the 3090/4090.

You still need to offload even at 4bit with both the text-encoder and transformer. Takes it over 32gb from my testing. Still have to pipe.enable_model_cpu_offload() this is just with bitsandbytes though. Which is not as good as nunchaku. You could compute prompt embeds separate though as this comes in at around 16gb for the transfomer at 4bit. IMO the text_encoder is bit ridiculous for flux-2,.dev in terms of the results given

Is this with using the fp8 or fp16 text encoder?

I've tried 4 bit and 8 bit on the GPU and bfloat16 on the cpu. It's more its comparable to Qwen 2.5 used in qwen image. Just much more massive without much noticeable difference. And as people are mentioning Z-image is also using Qwen 3 for the text-encoder to great effect which is a much smaller model. So not sure why they are using such a heavy Mistrial LLM.

I'm not am expert on all these details just going by how things seem from my minimal testing.

JoeGaffney avatar Nov 30 '25 19:11 JoeGaffney

Z-image is much better than Flux2, so we prioritize adapting to it and can enjoy sub second generation

But Z-image is already pretty fast and pretty easy to run even on mid-range hardware. I've seen people on laptops run it well. In my opinion, Nunchaku for Flux.2 makes much more sense since it's almost impossible to run it on mid-range hardware without very solid quantization.

alex-mitov avatar Dec 02 '25 02:12 alex-mitov

Considering the purpose of the Nunchaku project is to enable the operation of models that are difficult to run on civilian GPUs,

and Z-image can already run on most civilian GPUs, Flux.2 seems urgent.

wkdtjs avatar Dec 02 '25 23:12 wkdtjs

I'll go for Z-image accelerated via nunchaku. It would be awesome

iamwavecut avatar Dec 03 '25 00:12 iamwavecut

+1 for Z-Image! I've made a feature request: #814

atgctg avatar Dec 03 '25 19:12 atgctg

Considering the purpose of the Nunchaku project is to enable the operation of models that are difficult to run on civilian GPUs,

and Z-image can already run on most civilian GPUs, Flux.2 seems urgent.

That's not the only goal. You can quant to 4bit with many other methods (to get a similar size), but Nunchaku method preserves very close to the 16bit look and is 3x times faster inference!

JoeGaffney avatar Dec 03 '25 20:12 JoeGaffney

We are waiting for flux2 nunchaku!

jarkevithwlad avatar Dec 05 '25 13:12 jarkevithwlad