Sylvain Gugger
Sylvain Gugger
Re-ping of @ArthurZucker
There is only one call to head now once the model is cached @Narsil
Thanks again for all your work on this!
You are not using the [`ddp_timeout`](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments.ddp_timeout) training argument to put a higher value than 30 minutes, so if you have a big dataset to preprocess, you get this error. Use...
If you use `torch.distributed.launch` with a `ddp_timeout` that is not listened to, it sounds like a bug in PyTorch ;-)
cc @Rocketknight1
Let's maybe wait for the LLaMa PR to be merged first?
`load_checkpoint_and_dispatch` is intended for naive model parallelism and not compatible with DeepSpeed.
Yes, this model is not compatible with torchscript, cc @ArthurZucker
cc @gante