dmammfl

Results 3 issues of dmammfl

### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction I am trying to tune the model with accelerate multi-node training examples(examples/full_multi_gpu/multi_node.sh) But when...

pending

I'm currently training Llama-3-8B model in 2 GPUs with Pipeline parallel only. However, when i save a checkpoint on each rank, half of that checkpoint is saved. (Layer 1 is...

bug