dmammfl
Results
3
issues of
dmammfl
### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction I am trying to tune the model with accelerate multi-node training examples(examples/full_multi_gpu/multi_node.sh) But when...
pending
I'm currently training Llama-3-8B model in 2 GPUs with Pipeline parallel only. However, when i save a checkpoint on each rank, half of that checkpoint is saved. (Layer 1 is...
bug