quanto
quanto copied to clipboard
Support for new diffuser: flux1.schnell
@sayakpaul
Hello,
I am looking for support for saving and loading the flux1.schnell model from Blackforest.
Following your code from the "Bonus" here
Saving
from diffusers import PixArtTransformer2DModel
from optimum.quanto import QuantizedPixArtTransformer2DModel, qfloat8
model = PixArtTransformer2DModel.from_pretrained("PixArt-alpha/PixArt-Sigma-XL-2-1024-MS", subfolder="transformer")
qmodel = QuantizedPixArtTransformer2DModel.quantize(model, weights=qfloat8)
qmodel.save_pretrained("pixart-sigma-fp8")
Loading
from optimum.quanto import QuantizedPixArtTransformer2DModel
import torch
transformer = QuantizedPixArtTransformer2DModel.from_pretrained("pixart-sigma-fp8")
transformer.to(device="cuda", dtype=torch.float16)
I am looking for a similar option to save and load the two quantized models in this repo here,
see line 38,45,46
transformer = FluxTransformer2DModel.from_pretrained(bfl_repo, subfolder="transformer", torch_dtype=dtype, revision=revision)
quantize(transformer, weights=qfloat8)
freeze(transformer)
and 35,48,49.
text_encoder_2 = T5EncoderModel.from_pretrained(bfl_repo, subfolder="text_encoder_2", torch_dtype=dtype, revision=revision)
quantize(text_encoder_2, weights=qfloat8)
freeze(text_encoder_2)
If I'm correctly understanding this, there would need to be somethin like
from optimum.quanto import QuantizedFluxTransformer2DModel
for the transformer
and
from optimum.quanto import T5EncoderModel
for the text_encoder_2
to be able to save and load the quantized models for the transformer and encoder. Is that correct? Or is tehre another possibility which avoids importing the quantized version of a model from optimum.quanto?
Thank you!
@sayakpaul