quanto issues

feat: add HIP support

3

# What does this PR do? Fixes # ([issue](https://github.com/huggingface/optimum-quanto/issues/182)) GEMM and GEMV kernels can't be compiled for AMD HIP. This PR adds a check for HIP and uses unpack kernel...

Disty0

feat: implement load and save support from the Hub.

3

# What does this PR do? Implements saving and loading support from the Hub.

sayakpaul

docs: fix typo in file name s/READMD.md/README.md/

Fix typo in file name `s/READMD.md/README.md/`. Affected file: * https://github.com/huggingface/optimum-quanto/blob/601dc193ce0ed381c479fde54a81ba546bdf64d1/examples/vision/StableDiffusion/READMD.md CC: @dacorvo

dvrogozh

Marlin fp8

8

Fixes https://github.com/huggingface/optimum-quanto/issues/238

fxmarty

[WIP] Whisper demo for ASR

2

# What does this PR do? Following Issue #169, this adds a demo for ASR with Whisper ## Before submitting - [x] Did you read the [contributor guideline](https://github.com/huggingface/optimum-quanto/blob/main/CONTRIBUTING.md#create-a-pull-request), Pull Request...

mattiadg

Non-strict loading of the state dict

9

Hi, I'm currently investigating the addition of optimum-quanto to PEFT. This mostly works already but I'm hitting a wall when it comes to loading the `state_dict`. When loading a PEFT...

BenjaminBossan

NF4/bitsandbytes support for flux.dev and flux.schnell

It is talked about here: https://github.com/lllyasviel/stable-diffusion-webui-forge/discussions/981 According to the author: > (i) NF4 is significantly faster than FP8. For GPUs with 6GB/8GB VRAM, the speed-up is about 1.3x to 2.5x...

AmericanPresidentJimmyCarter

Support for FP8 Matmuls

2

Int8 matrix multiplication kernels are currently called on CUDA and CPU devices when activations and weights are quantized to int8. However, FP8 matmuls are not used when activations and weights...

maktukmak

Support for new diffuser: flux1.schnell

5

@sayakpaul Hello, I am looking for support for saving and loading the flux1.schnell model from Blackforest. Following your code from the "Bonus" [here](https://github.com/huggingface/blog/blob/main/quanto-diffusers.md#bonus---saving-and-loading-diffusers-models-in-quanto) Saving ``` from diffusers import PixArtTransformer2DModel from...

KoppAlexander

`qint4` failing with PixArt Transformer

9

Install `diffusers` first. And then do: ```python from diffusers import DiffusionPipeline from optimum.quanto import quantize, freeze, qint4 import torch ckpt_id = "ptx0/pixart-900m-1024-ft" torch_dtype = torch.float16 pipe = DiffusionPipeline.from_pretrained(ckpt_id, torch_dtype=torch_dtype).to("cuda") if...

sayakpaul

quanto
quanto copied to clipboard

Metadata

feat: add HIP support

feat: implement load and save support from the Hub.

docs: fix typo in file name s/READMD.md/README.md/

Marlin fp8

[WIP] Whisper demo for ASR

Non-strict loading of the state dict

NF4/bitsandbytes support for flux.dev and flux.schnell

Support for FP8 Matmuls

Support for new diffuser: flux1.schnell

`qint4` failing with PixArt Transformer

← Metadata

Owner

Metadata

quanto quanto copied to clipboard

Metadata

← Metadata

Owner

Metadata

quanto
quanto copied to clipboard