Marc Sun comments

Results 59 comments of


                                            Marc Sun

[Core] introduce _no_split_modules to `ModelMixin`

No, the core design looks very good ! It is similar to `transformers` and `device_map` is working well there.

[Core] introduce _no_split_modules to `ModelMixin`

> model_split_percents = [0.5, 0.3, 0.4] is the one that seems to work for both multi-GPU and single-GPU environments for the UNet under consideration. The size of the UNet is...

[Core] introduce _no_split_modules to `ModelMixin`

> so if the gpu that every model use can be set manually will be a better solution. because i can test every different gpu setting You will be able...

Error loading AutoModelForCausalLM with map_device="auto", load_in_8bit=True and fp16=True / weight is on the meta device

Hi @JosephChotard , thanks for reporting. I was not able to reproduce the error in my [colab notebook](https://colab.research.google.com/drive/13-B3SAYTfwzl48rF2YmN-KmYiKdgZQmD?usp=sharing). Please check that you have the latest version of bitsandbytes. This is...

Will these optimization integrate into hf's code?

Thanks for the interest ! We already support most of the optimization described here: - [Torch.compile](https://huggingface.co/docs/transformers/perf_torch_compile) with pytorch blog [here](https://huggingface.co/docs/transformers/perf_torch_compile) - 4-bit quant with [GPTQ](https://huggingface.co/docs/transformers/main_classes/quantization) and recently AWQ which is...

Will these optimization integrate into hf's code?

Yes, absolutely! cc @younesbelkada for visibility

Implement Triton-based AWQ kernel

fyi @fxmarty

update the logic of `is_sequential_cpu_offload`

Hey @keepdying, could you share a minimal reproducer of the error that you are facing in a seperate issue ? We can definitely switch to checking the offload attribute.

support 2bit quip# method

Thanks for the update @tsengalb99 ! Very excited for this new methods 🔥 Would you mind explaining a bit more why cuda graphs are needed ? Also, in general, do...

似乎不支持cpu

Hi @andeyeluguo, you can't quantize a model using gptq quantization scheme without GPUs since it is not supported + it would take way too much time. However, you can run...