quanto
quanto copied to clipboard
A pytorch Quantization Toolkit
When performing quantized matrix multiplication between `int8` weights on an AMD CPU, the results are different than those obtained when running the same operation on CUDA or on an Intel...
https://github.com/huggingface/optimum-quanto/blob/b0cce2435f0b72d8d8a6f0dc6b18dc409160b394/optimum/quanto/nn/qlayernorm.py#L44 LayerNorm with `None` weights will raise here. ``` Flux( (pe_embedder): EmbedND() (img_in): Linear(in_features=64, out_features=3072, bias=True) (time_in): MLPEmbedder( (in_layer): Linear(in_features=256, out_features=3072, bias=True) (silu): SiLU() (out_layer): Linear(in_features=3072, out_features=3072, bias=True) ) (vector_in):...
I am no expert on Linux, Ubuntu, Python, pip, Flux.1-dev or any of this technology so forgive me if this is something obvious and something I did wrong somewhere. As...
hello @dacorvo thanks for all the prompt feedback so far. I might be doing something suboptimally or just incorrectly, but we're running into this a lot where `.view()` is being...
transformers 4.41.2 optimum-quanto 0.2.1 torch 2.3.1 Python 3.10.14 I performed this on a recent google GCP VM with Nvidia driver setup and basic torch sanity test passing. I tried to...
There is a potential issue in the `_load_from_state_dict` method where reloading frozen weights into a frozen module might cause a gradient-related error. The FIXME comment in the code points out...
hello, just making a note of these MPS kernels: https://github.com/pytorch/ao/pull/954
@kechan reported compilation failures when using quanto in Google Colab, both on CPU and GPU.
I can see that optimum-quanto provides several external (weight-only) quantization algorithm such as smoothquant and awq in [here](https://github.com/huggingface/optimum-quanto/tree/main/external). It looks like smoothquant only supports OPT models, and awq is still...
When I used qfloat8 to quantize the unet model of Kolors-diffusers, it works well. But failed with qint4. # use qint4/(qfloat8) class KolorsUNet2DConditionModel(QuantizedDiffusersModel): base_class = UNet2DConditionModel model = UNet2DConditionModel.from_pretrained("./Kolors-diffusers", variant="fp16",...