axolotl
axolotl copied to clipboard
Support DeepSpeed AutoTP
⚠️ Please check that this feature request hasn't been suggested before.
- [x] I searched previous Ideas in Discussions didn't find any similar feature requests.
- [x] I searched previous Issues didn't find any similar feature requests.
🔖 Feature description
AutoTP was added a few weeks ago in 0.16.4 and claims speedups of up to 4x.
While ZeRO3 offers superior memory efficiency, it incurs significant communication costs. ZeRO (1/2) has lower communication overhead, but in the case of very large models, it cannot be used directly due to memory limitations. Therefore, combining TP with ZeRO (1/2) offers more balanced options for memory and performance. Moreover, through TP, we can alleviate the batch scaling limitations imposed by ZeRO/FSDP.
https://github.com/deepspeedai/DeepSpeed/blob/master/blogs/huggingface-tp/README.md
✔️ Solution
~When enabling AutoTP on a Mistral model with ZeRO 2, an error is triggered right at the beginning of training "dataset inconsistency error between DP and TP".~ This is solved, requires accelerate>=1.6.0
This DeepSpeed config is tested on 8x H100:
{
"zero_optimization": {
"stage": 2,
"contiguous_gradients": true,
"overlap_comm": true
},
"tensor_parallel":{
"autotp_size": 8
},
"bf16": {
"enabled": true
},
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false
}
❓ Alternatives
No response
📝 Additional Context
@winglian asked me to open an issue
Acknowledgements
- [x] My issue title is concise, descriptive, and in title casing.
- [x] I have searched the existing issues to make sure this feature has not been requested yet.
- [x] I have provided enough information for the maintainers to understand and evaluate this request.
New example is out for this.
https://github.com/deepspeedai/DeepSpeedExamples/blob/592d28fa45c12613f39ed388e043be760707237c/training/tensor_parallel/train.py
@casper-hansen when you tried it, did you run into:
[rank0]: File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1772, in inner
[rank0]: args_kwargs_result = hook(self, args, kwargs) # type: ignore[misc]
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 490, in check_dataloader_inputs_same_across_ranks
[rank0]: broadcast_and_check(kwargs, bcast_rank, bcast_group)
[rank0]: File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 479, in broadcast_and_check
[rank0]: assert torch.equal(
[rank0]: ^^^^^^^^^^^^
[rank0]: AssertionError: Data inconsistency within the TP group. Please check the Dataloader implementation to ensure consistency.
@winglian yes, but I seem to remember upgrading to latest accelerate fixed it
@casper-hansen this should be working now in lastest main