Marc Sun
Marc Sun
No, the core design looks very good ! It is similar to `transformers` and `device_map` is working well there.
> model_split_percents = [0.5, 0.3, 0.4] is the one that seems to work for both multi-GPU and single-GPU environments for the UNet under consideration. The size of the UNet is...
> so if the gpu that every model use can be set manually will be a better solution. because i can test every different gpu setting You will be able...
Hi @JosephChotard , thanks for reporting. I was not able to reproduce the error in my [colab notebook](https://colab.research.google.com/drive/13-B3SAYTfwzl48rF2YmN-KmYiKdgZQmD?usp=sharing). Please check that you have the latest version of bitsandbytes. This is...
Thanks for the interest ! We already support most of the optimization described here: - [Torch.compile](https://huggingface.co/docs/transformers/perf_torch_compile) with pytorch blog [here](https://huggingface.co/docs/transformers/perf_torch_compile) - 4-bit quant with [GPTQ](https://huggingface.co/docs/transformers/main_classes/quantization) and recently AWQ which is...
Yes, absolutely! cc @younesbelkada for visibility
fyi @fxmarty
Hey @keepdying, could you share a minimal reproducer of the error that you are facing in a seperate issue ? We can definitely switch to checking the offload attribute.
Thanks for the update @tsengalb99 ! Very excited for this new methods 🔥 Would you mind explaining a bit more why cuda graphs are needed ? Also, in general, do...
Hi @andeyeluguo, you can't quantize a model using gptq quantization scheme without GPUs since it is not supported + it would take way too much time. However, you can run...