Andrei Panferov

Results 41 comments of Andrei Panferov

Disabling `'low_cpu_mem_usage': True` shouldn't be necessary once the new `accelerate` version is released (this [PR](https://github.com/huggingface/accelerate/pull/2376) fixed the error). Moreover, we're [working on integrating](https://github.com/huggingface/transformers/pull/28928) AQLM into the newly added [quantizers](https://huggingface.co/docs/transformers/main/en/hf_quantizer) interface...

Hi @oobabooga! I just wanted to let you know that we've updated our fine-tuning setup for AQLM, once again greatly improving the performance. The new results can be found in...

Hi @hiyouga ! It's, sadly, not properly documented yet, but you should do: ```python import aqlm with aqlm.optimize_for_training(): model = AutoModelForCausalLM.from_pretrained(...) ``` The thing is, there a few ways to...

A bit more context: those are the speeds for a typical layer on an _RTX 3090 GPU_. We have a kernel for a single token pass (generation), which is slightly...

I've merged #39 and released `aqlm==1.1.0` where I got rid of the need to use `aqlm.optimize_for_training()`. Everything is determined automatically from here on.

The only reason `aqlm` requires `python>=3.10` is a single `match-case` statement in a non-critical place. I was able to run `aqlm` on `python 3.8` no problem otherwise. I can replace...

@SunMarc `aqlm` will support python `>=3.8` starting version `1.0.2`. I'm [1 PR](https://github.com/Vahe1994/AQLM/pull/26) away from releasing it.

@SunMarc `aqlm==1.0.2` is out. May I ask you to please update the docker images?

Are you sure you are using right weights? The code was [refactored](https://github.com/huggingface/transformers/pull/21955#issuecomment-1459073934) so that previously converted weights are no longer valid and, from what I've seen, model outputs NaNs on...

Note to self: matmul doesn't work for more than one vector at a time for some reason