David Corvoysier
David Corvoysier
quantized weights, scales and metadata can be quantized into a state_dict that can later be reloaded and applied to a quantized model. The process is a bit convoluted, as it...
Whenever pytorch is upgraded, we should force a recompilation of the extensions because the pytorch ABI is not guaranteed to be compatible. For instance, the cpp extension compiled with pytorch...
Vision Transformers should be supported out-of-the-box by `quanto`. The goal of this issue is to add some examples under `examples/vision`. At the very minimum, there should be a classification example,...
There are many kernels available to perform efficiently matrix multiplication using packed `int4` weights and `float16` inputs. The goal of this issue is to select some of them and add...
Ruff is faster than black and produces nearly identical results. The goal of this issue is to switch to ruff for the style check and formatting.
This corresponds to the following pytorch issue: https://github.com/pytorch/pytorch/issues/114389
I am using the `litellm` client to benchmark a HuggingFace TGI server. In `token_benchmark_ray.py`, `req_launcher.get_next_ready()` is called periodically to fetch pending **results**, with the `block` parameter set to False. However,...
When running the pixart sigma example on CUDA arch >= 80 with `int4` weights, the following error happens: ```shell File "/home/ubuntu/dev/quanto/optimum/quanto/tensor/qtensor_func.py", line 152, in linear return QTensorLinear.apply(input, other, bias) File...
When building a package locally, the `.h`, `.cpp` and `.cu` files are added to `MANIFEST.in` automatically by `setuptools_scm`. However, when building the package on the CI, or when installing it...
# What does this PR do? This simply adds an option to `setup.py` to install quanto, and some basic instructions to quantize, save and reload a model.