David Corvoysier issues

Results 16 issues of


                                            David Corvoysier

Write a helper to reload a quantized state_dict

quantized weights, scales and metadata can be quantized into a state_dict that can later be reloaded and applied to a quantized model. The process is a bit convoluted, as it...

help wanted

good first issue

Stale

Force a recompilation of the extensions when upgrading pytorch

Whenever pytorch is upgraded, we should force a recompilation of the extensions because the pytorch ABI is not guaranteed to be compatible. For instance, the cpp extension compiled with pytorch...

bug

Add examples based on ViT

Vision Transformers should be supported out-of-the-box by `quanto`. The goal of this issue is to add some examples under `examples/vision`. At the very minimum, there should be a classification example,...

help wanted

good first issue

Add CUDA kernels for Wint4Afloat16

There are many kernels available to perform efficiently matrix multiplication using packed `int4` weights and `float16` inputs. The goal of this issue is to select some of them and add...

Switch to ruff native formatter

Ruff is faster than black and produces nearly identical results. The goal of this issue is to switch to ruff for the style check and formatting.

help wanted

good first issue

QTensor cannot be created from inside a dynamo graph

This corresponds to the following pytorch issue: https://github.com/pytorch/pytorch/issues/114389

Blocking on pending requests despite block == false

I am using the `litellm` client to benchmark a HuggingFace TGI server. In `token_benchmark_ray.py`, `req_launcher.get_next_ready()` is called periodically to fetch pending **results**, with the `block` parameter set to False. However,...

Pixart sigma example crash on CUDA arch >= 80 with int4 weights

When running the pixart sigma example on CUDA arch >= 80 with `int4` weights, the following error happens: ```shell File "/home/ubuntu/dev/quanto/optimum/quanto/tensor/qtensor_func.py", line 152, in linear return QTensorLinear.apply(input, other, bias) File...

Packages created on the CI are missing cpp and cuda extension files

When building a package locally, the `.h`, `.cpp` and `.cu` files are added to `MANIFEST.in` automatically by `setuptools_scm`. However, when building the package on the CI, or when installing it...

Add quanto install and instructions

# What does this PR do? This simply adds an option to `setup.py` to install quanto, and some basic instructions to quantize, save and reload a model.