Sylvain Gugger

Results 633 comments of Sylvain Gugger

There is only one call to head now once the model is cached @Narsil

You are not using the [`ddp_timeout`](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments.ddp_timeout) training argument to put a higher value than 30 minutes, so if you have a big dataset to preprocess, you get this error. Use...

If you use `torch.distributed.launch` with a `ddp_timeout` that is not listened to, it sounds like a bug in PyTorch ;-)

Let's maybe wait for the LLaMa PR to be merged first?

`load_checkpoint_and_dispatch` is intended for naive model parallelism and not compatible with DeepSpeed.

Yes, this model is not compatible with torchscript, cc @ArthurZucker