Sylvain Gugger

Results 633 comments of Sylvain Gugger

With latest Accelerate as well? You can check if `model._hf_hook.skip_keys` errors or shows `"past_key_values"`.

I can't reproduce. On main of Transformers and Accelerate, ```py import torch from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("huggyllama/llama-7b", device_map="auto", torch_dtype=torch.float16) print(model._hf_hook.skip_keys) ``` gets me `"past_key_values"`. What's your repro?

What model architecture is GPTQ-for-LlaMa? If it's custom code, it needs to implement `_split_key_device_placement` like [here](https://github.com/huggingface/transformers/blob/e03a9cc0cd7623a8d5208d7a4206f628b2bd5513/src/transformers/models/llama/modeling_llama.py#L345) to work out of the box :-)

Thanks for iterating!

`load_checkpoint_and_dispatch` is a tool for inference, not for training. If you are using FSDP you should let FSDP insantiates the model on several devices and split the tensors as it...

Could you provide us with the result of `accelerate env` as requested in the issue template? Also cc @pacman100