quanto
quanto copied to clipboard
A pytorch Quantization Toolkit
I'm interested in profiling how well various architectures do after quantizing to various WxAx, and I'm using `lm-eval` to do so. `lm-eval` needs a path where a model is saved,...
# What does this PR do Fix a couple of issues related to safetensors + loading with module on `meta` device. Draft for now
quantized weights, scales and metadata can be quantized into a state_dict that can later be reloaded and applied to a quantized model. The process is a bit convoluted, as it...
To test: tinypipe = StableDiffusionPipeline.from_pretrained("hf-internal-testing/tiny-stable-diffusion-torch") tinypipe.save_pretrained("tinypipe-full", safe_serialization=True) quantize(tinypipe.unet, weights=qint2) tinypipe.save_pretrained("tinypipe-qint2", safe_serialization=True) ``` ValueError Traceback (most recent call last) Cell In[40], line 7 2 tinypipe.save_pretrained("tinypipe-full", safe_serialization=True) 3 quantize(tinypipe.unet, weights=qint2) ----> 4...
### Feature request There is a GitHub repo out with the necessary kernels and code (and a great paper) to train a transformer based models using int4. The authors use...
When loading a mistral model I noticed that the `output_scale` and `input_scale` values associated with the quantized tensors were just tensors with the value 1, i.e. `tensor(1., device='cuda:0')` This seems...
Awesome work! I noticed there are smooth quant implemented under [external](https://github.com/huggingface/quanto/tree/main/external/smoothquant). Currently, its implementation seems to be model-specific, we can only apply smooth on special `Linear`. However, in general, the...
batch_size: 1, torch_dtype: fp32, unet_dtype: int8 in 3.754 seconds. Memory: 5.240GB. batch_size: 1, torch_dtype: fp32, unet_dtype: None in 3.378 seconds. Memory: 6.073GB. I'm using the example code for stable diffusion,...
Whenever pytorch is upgraded, we should force a recompilation of the extensions because the pytorch ABI is not guaranteed to be compatible. For instance, the cpp extension compiled with pytorch...
```python class Model(nn.Module): def __init__(self): super(Model, self).__init__() self.layer = nn.Linear(10,1) def forward(self, input): out = self.layer(input) return out ``` Above is the model I have defined. When I try to...