quanto icon indicating copy to clipboard operation
quanto copied to clipboard

A pytorch Quantization Toolkit

Results 61 quanto issues
Sort by recently updated
recently updated
newest added

I'm interested in profiling how well various architectures do after quantizing to various WxAx, and I'm using `lm-eval` to do so. `lm-eval` needs a path where a model is saved,...

# What does this PR do Fix a couple of issues related to safetensors + loading with module on `meta` device. Draft for now

quantized weights, scales and metadata can be quantized into a state_dict that can later be reloaded and applied to a quantized model. The process is a bit convoluted, as it...

help wanted
good first issue
Stale

To test: tinypipe = StableDiffusionPipeline.from_pretrained("hf-internal-testing/tiny-stable-diffusion-torch") tinypipe.save_pretrained("tinypipe-full", safe_serialization=True) quantize(tinypipe.unet, weights=qint2) tinypipe.save_pretrained("tinypipe-qint2", safe_serialization=True) ``` ValueError Traceback (most recent call last) Cell In[40], line 7 2 tinypipe.save_pretrained("tinypipe-full", safe_serialization=True) 3 quantize(tinypipe.unet, weights=qint2) ----> 4...

Stale

### Feature request There is a GitHub repo out with the necessary kernels and code (and a great paper) to train a transformer based models using int4. The authors use...

enhancement
Stale

When loading a mistral model I noticed that the `output_scale` and `input_scale` values associated with the quantized tensors were just tensors with the value 1, i.e. `tensor(1., device='cuda:0')` This seems...

Stale

Awesome work! I noticed there are smooth quant implemented under [external](https://github.com/huggingface/quanto/tree/main/external/smoothquant). Currently, its implementation seems to be model-specific, we can only apply smooth on special `Linear`. However, in general, the...

question
Stale

batch_size: 1, torch_dtype: fp32, unet_dtype: int8 in 3.754 seconds. Memory: 5.240GB. batch_size: 1, torch_dtype: fp32, unet_dtype: None in 3.378 seconds. Memory: 6.073GB. I'm using the example code for stable diffusion,...

Whenever pytorch is upgraded, we should force a recompilation of the extensions because the pytorch ABI is not guaranteed to be compatible. For instance, the cpp extension compiled with pytorch...

bug

```python class Model(nn.Module): def __init__(self): super(Model, self).__init__() self.layer = nn.Linear(10,1) def forward(self, input): out = self.layer(input) return out ``` Above is the model I have defined. When I try to...

bug