quanto icon indicating copy to clipboard operation
quanto copied to clipboard

A pytorch Quantization Toolkit

Results 61 quanto issues
Sort by recently updated
recently updated
newest added

# What does this PR do? Fixes # ([issue](https://github.com/huggingface/optimum-quanto/issues/182)) GEMM and GEMV kernels can't be compiled for AMD HIP. This PR adds a check for HIP and uses unpack kernel...

# What does this PR do? Implements saving and loading support from the Hub.

Fix typo in file name `s/READMD.md/README.md/`. Affected file: * https://github.com/huggingface/optimum-quanto/blob/601dc193ce0ed381c479fde54a81ba546bdf64d1/examples/vision/StableDiffusion/READMD.md CC: @dacorvo

Fixes https://github.com/huggingface/optimum-quanto/issues/238

# What does this PR do? Following Issue #169, this adds a demo for ASR with Whisper ## Before submitting - [x] Did you read the [contributor guideline](https://github.com/huggingface/optimum-quanto/blob/main/CONTRIBUTING.md#create-a-pull-request), Pull Request...

Hi, I'm currently investigating the addition of optimum-quanto to PEFT. This mostly works already but I'm hitting a wall when it comes to loading the `state_dict`. When loading a PEFT...

It is talked about here: https://github.com/lllyasviel/stable-diffusion-webui-forge/discussions/981 According to the author: > (i) NF4 is significantly faster than FP8. For GPUs with 6GB/8GB VRAM, the speed-up is about 1.3x to 2.5x...

Int8 matrix multiplication kernels are currently called on CUDA and CPU devices when activations and weights are quantized to int8. However, FP8 matmuls are not used when activations and weights...

@sayakpaul Hello, I am looking for support for saving and loading the flux1.schnell model from Blackforest. Following your code from the "Bonus" [here](https://github.com/huggingface/blog/blob/main/quanto-diffusers.md#bonus---saving-and-loading-diffusers-models-in-quanto) Saving ``` from diffusers import PixArtTransformer2DModel from...

Install `diffusers` first. And then do: ```python from diffusers import DiffusionPipeline from optimum.quanto import quantize, freeze, qint4 import torch ckpt_id = "ptx0/pixart-900m-1024-ft" torch_dtype = torch.float16 pipe = DiffusionPipeline.from_pretrained(ckpt_id, torch_dtype=torch_dtype).to("cuda") if...