quanto issues

Saving and loading quantized models doesn't work?

17

I'm interested in profiling how well various architectures do after quantizing to various WxAx, and I'm using `lm-eval` to do so. `lm-eval` needs a path where a model is saved,...

tanishqkumar

Fix serialization

3

# What does this PR do Fix a couple of issues related to safetensors + loading with module on `meta` device. Draft for now

SunMarc

Write a helper to reload a quantized state_dict

11

quantized weights, scales and metadata can be quantized into a state_dict that can later be reloaded and applied to a quantized model. The process is a bit convoluted, as it...

dacorvo

help wanted

good first issue

Stale

Safetensor serialization throws "conv_in.weight.qtype is invalid expected torch.Tensor but received string"

3

To test: tinypipe = StableDiffusionPipeline.from_pretrained("hf-internal-testing/tiny-stable-diffusion-torch") tinypipe.save_pretrained("tinypipe-full", safe_serialization=True) quantize(tinypipe.unet, weights=qint2) tinypipe.save_pretrained("tinypipe-qint2", safe_serialization=True) ``` ValueError Traceback (most recent call last) Cell In[40], line 7 2 tinypipe.save_pretrained("tinypipe-full", safe_serialization=True) 3 quantize(tinypipe.unet, weights=qint2) ----> 4...

lsb

Stale

Feature Request/Int4 Cuda Kernels

1

### Feature request There is a GitHub repo out with the necessary kernels and code (and a great paper) to train a transformer based models using int4. The authors use...

NicolasMejiaPetit

enhancement

Stale

Quanto scale values seem unpopulated in quantized model

2

When loading a mistral model I noticed that the `output_scale` and `input_scale` values associated with the quantized tensors were just tensors with the value 1, i.e. `tensor(1., device='cuda:0')` This seems...

raunaks13

Stale

Question: any plan to formally support smooth quantization and make it more general

2

Awesome work! I noticed there are smooth quant implemented under [external](https://github.com/huggingface/quanto/tree/main/external/smoothquant). Currently, its implementation seems to be model-specific, we can only apply smooth on special `Linear`. However, in general, the...

yiliu30

question

Stale

Why the quantized net is slower?

1

batch_size: 1, torch_dtype: fp32, unet_dtype: int8 in 3.754 seconds. Memory: 5.240GB. batch_size: 1, torch_dtype: fp32, unet_dtype: None in 3.378 seconds. Memory: 6.073GB. I'm using the example code for stable diffusion,...

theguardsgod

Force a recompilation of the extensions when upgrading pytorch

Whenever pytorch is upgraded, we should force a recompilation of the extensions because the pytorch ABI is not guaranteed to be compatible. For instance, the cpp extension compiled with pytorch...

dacorvo

bug

Unable quantize a single linear layer: throws error: ValueError: Cannot quantize Tensor of shape torch.Size([1, 10]) along axis 0 of size 1

1

```python class Model(nn.Module): def __init__(self): super(Model, self).__init__() self.layer = nn.Linear(10,1) def forward(self, input): out = self.layer(input) return out ``` Above is the model I have defined. When I try to...

rajat-008

bug

quanto
quanto copied to clipboard

Metadata

Saving and loading quantized models doesn't work?

Fix serialization

Write a helper to reload a quantized state_dict

Safetensor serialization throws "conv_in.weight.qtype is invalid expected torch.Tensor but received string"

Feature Request/Int4 Cuda Kernels

Quanto scale values seem unpopulated in quantized model

Question: any plan to formally support smooth quantization and make it more general

Why the quantized net is slower?

Force a recompilation of the extensions when upgrading pytorch

Unable quantize a single linear layer: throws error: ValueError: Cannot quantize Tensor of shape torch.Size([1, 10]) along axis 0 of size 1

← Metadata

Owner

Metadata

quanto quanto copied to clipboard

Metadata

← Metadata

Owner

Metadata

quanto
quanto copied to clipboard