accelerate
accelerate copied to clipboard
Incorrect Argument Default for DeepSpeed Multi-node Training
System Info
pip install accelerate.
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
) - [ ] My own task or dataset (give details below)
Reproduction
Run accelerate for multi-node training.
Expected behavior
Accelerate is setting the default DeepSpeed hostfile to None this overrides the DeepSpeed default of /job/hostfile
. Overriding this default is causing issues with users attempting multi-node trading. Please change the default to match DeepSpeed's default.