quanto issues

Accuracy issue when using torch._int_mm on AMD CPUs

2

When performing quantized matrix multiplication between `int8` weights on an AMD CPU, the results are different than those obtained when running the same operation on CUDA or on an Intel...

dacorvo

LayerNorm with None weight throws exception

2

https://github.com/huggingface/optimum-quanto/blob/b0cce2435f0b72d8d8a6f0dc6b18dc409160b394/optimum/quanto/nn/qlayernorm.py#L44 LayerNorm with `None` weights will raise here. ``` Flux( (pe_embedder): EmbedND() (img_in): Linear(in_features=64, out_features=3072, bias=True) (time_in): MLPEmbedder( (in_layer): Linear(in_features=256, out_features=3072, bias=True) (silu): SiLU() (out_layer): Linear(in_features=3072, out_features=3072, bias=True) ) (vector_in):...

doctorpangloss

optimum-quanto 0.25 requires ninja but 'pip check flux' reports 'ninja-1.11.1.1 is not supported on this platform'

I am no expert on Linux, Ubuntu, Python, pip, Flux.1-dev or any of this technology so forgive me if this is something obvious and something I did wrong somewhere. As...

Davros666

issues with non-contiguous Tensor

1

hello @dacorvo thanks for all the prompt feedback so far. I might be doing something suboptimally or just incorrectly, but we're running into this a lot where `.view()` is being...

bghira

Inference from a reload quantized open clip model (by .load_state_dict) resulted in IndexError

7

transformers 4.41.2 optimum-quanto 0.2.1 torch 2.3.1 Python 3.10.14 I performed this on a recent google GCP VM with Nvidia driver setup and basic torch sanity test passing. I tried to...

kechan

Potential Gradient Error when Reloading Frozen Weights in `qmodule.py` `_load_from_state_dict`

2

There is a potential issue in the `_load_from_state_dict` method where reloading frozen weights into a frozen module might cause a gradient-related error. The FIXME comment in the code points out...

cjfghk5697

mps low-bit kernels from torchao

hello, just making a note of these MPS kernels: https://github.com/pytorch/ao/pull/954

bghira

Verify extension behaviour in google Colab

7

@kechan reported compilation failures when using quanto in Google Colab, both on CPU and GPU.

dacorvo

Does AWQ is officially supported now?

3

I can see that optimum-quanto provides several external (weight-only) quantization algorithm such as smoothquant and awq in [here](https://github.com/huggingface/optimum-quanto/tree/main/external). It looks like smoothquant only supports OPT models, and awq is still...

lifelongeeek

qint4 failed for diffusers: QBitsTensor cannot be changed

When I used qfloat8 to quantize the unet model of Kolors-diffusers, it works well. But failed with qint4. # use qint4/(qfloat8) class KolorsUNet2DConditionModel(QuantizedDiffusersModel): base_class = UNet2DConditionModel model = UNet2DConditionModel.from_pretrained("./Kolors-diffusers", variant="fp16",...

liyihao1230

quanto
quanto copied to clipboard

Metadata

Accuracy issue when using torch._int_mm on AMD CPUs

LayerNorm with None weight throws exception

optimum-quanto 0.25 requires ninja but 'pip check flux' reports 'ninja-1.11.1.1 is not supported on this platform'

issues with non-contiguous Tensor

Inference from a reload quantized open clip model (by .load_state_dict) resulted in IndexError

Potential Gradient Error when Reloading Frozen Weights in `qmodule.py` `_load_from_state_dict`

mps low-bit kernels from torchao

Verify extension behaviour in google Colab

Does AWQ is officially supported now?

qint4 failed for diffusers: QBitsTensor cannot be changed

← Metadata

Owner

Metadata

quanto quanto copied to clipboard

Metadata

← Metadata

Owner

Metadata

quanto
quanto copied to clipboard