Sylvain Gugger
Sylvain Gugger
With latest Accelerate as well? You can check if `model._hf_hook.skip_keys` errors or shows `"past_key_values"`.
I can't reproduce. On main of Transformers and Accelerate, ```py import torch from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("huggyllama/llama-7b", device_map="auto", torch_dtype=torch.float16) print(model._hf_hook.skip_keys) ``` gets me `"past_key_values"`. What's your repro?
What model architecture is GPTQ-for-LlaMa? If it's custom code, it needs to implement `_split_key_device_placement` like [here](https://github.com/huggingface/transformers/blob/e03a9cc0cd7623a8d5208d7a4206f628b2bd5513/src/transformers/models/llama/modeling_llama.py#L345) to work out of the box :-)
Thanks for iterating!
`load_checkpoint_and_dispatch` is a tool for inference, not for training. If you are using FSDP you should let FSDP insantiates the model on several devices and split the tensors as it...
cc @pacman100 who would know more.
cc @muellerzr
cc @pacman100
Could you provide us with the result of `accelerate env` as requested in the issue template? Also cc @pacman100