PR type
- [ ] Bug Fix
- [x] New Feature
- [ ] Document Updates
- [ ] More Model or Dataset Support
PR information
Previous PR: https://github.com/modelscope/swift/pull/647
- Integrate more model patch function for torchacc.
- Support stat speed metrics for after some warmup steps(since there is compile time in the beginning of torchacc)
Experiment results
Paste your experiment result here(if needed).
We have test some models for torchacc and swift
- llama2-13b
method |
train_sample/s |
train_sample/s after warmup |
torchacc + 2fsdp |
3.775 |
4.426(1.13x) |
torchacc + 2ddp |
4.997(1.28x) |
5.416(1.38x) |
swift + 2ddp |
3.899 |
3.912 |
- baichuan2-13b
method |
train_sample/s |
train_sample/s after warmup |
torchacc + 2fsdp |
5.014(1.32x) |
6.039(1.60x) |
torchacc + 2ddp |
6.218(1.63x) |
6.861(1.80x) |
swift + 2ddp |
3.812 |
3.815 |
- chatglm3-6b
method |
train_sample/s |
train_sample/s after warmup |
torchacc + 2fsdp |
9.859(1.82x) |
11.896(2.19x) |
swift + 2ddp |
5.431 |
- |
- yi-34b
method |
train_sample/s |
train_sample/s after warmup |
torchacc + 4fsdp |
2.349 |
2.978(1.24x) |
swift + 2ddp + 2mp |
2.411 |
2.411 |
- llama3-8b
method |
train_sample/s |
train_sample/s after warmup |
torchacc + 2ddp |
9.216(1.13x) |
10.243(1.26x) |
swift + 2ddp |
8.126 |
- |
- qwen1.5-14b
method |
train_sample/s |
train_sample/s after warmup |
torchacc + 2ddp |
5.076(1.03x) |
5.376(1.08x) |
swift + 2ddp |
4.944 |
- |