Sylvain Gugger

All things Machine Learning prev Hugging Face/fast.ai

Results 633 comments of


                                            Sylvain Gugger

5x faster text generation on multi-GPU setups (+ lower VRAM consumption)

With latest Accelerate as well? You can check if `model._hf_hook.skip_keys` errors or shows `"past_key_values"`.

5x faster text generation on multi-GPU setups (+ lower VRAM consumption)

I can't reproduce. On main of Transformers and Accelerate, ```py import torch from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("huggyllama/llama-7b", device_map="auto", torch_dtype=torch.float16) print(model._hf_hook.skip_keys) ``` gets me `"past_key_values"`. What's your repro?

5x faster text generation on multi-GPU setups (+ lower VRAM consumption)

What model architecture is GPTQ-for-LlaMa? If it's custom code, it needs to implement `_split_key_device_placement` like [here](https://github.com/huggingface/transformers/blob/e03a9cc0cd7623a8d5208d7a4206f628b2bd5513/src/transformers/models/llama/modeling_llama.py#L345) to work out of the box :-)

fix the bug in xpu

Thanks for iterating!

Can accelerate train a single model on multiple TPU VMs (not a TPU Pod)?

cc @muellerzr

load_checkpoint_and_dispatch compatibility with accelerate FSDP?

`load_checkpoint_and_dispatch` is a tool for inference, not for training. If you are using FSDP you should let FSDP insantiates the model on several devices and split the tensors as it...

load_checkpoint_and_dispatch compatibility with accelerate FSDP?

cc @pacman100 who would know more.

A BUG? when performing gradient accumulation

cc @muellerzr

Deepspeed seems to be easier to OOM after 0.18.0.

cc @pacman100

OOM Error on fine-tuning gpt-j-6b

Could you provide us with the result of `accelerate env` as requested in the issue template? Also cc @pacman100

‹
1
2
...
53
54
55
56
57
58
59
...
63
64
›