Wang, Yi
Wang, Yi
@sgugger please help review
@delock @yao-matrix
Hi,@molly-smith, this PR is meant to reduce the host memory per Rank, support shard loading in AutoTP path, same with shard loading in kernel injection path.
pipeline running issue addressed by https://github.com/huggingface/transformers/pull/29722
| perf | A100 | Gaudi2 | |-------|------|----------| | BF16 | 528ms| 250ms| |FP32 | 410ms| 275ms| in my env python3 run_pipeline.py \ --model_name_or_path facebook/hf-seamless-m4t-medium \ --text "Hello, my dog...
@libinta please help review.
| perf | A100 | Gaudi2 | |-------|------|----------| | BF16 | 87ms| 11ms | | FP32 | 102ms| 13ms| in my env python3 run_pipeline.py \ --model_name_or_path facebook/mms-tts-eng \ --text "Hello,...
CI tests will be added after https://github.com/huggingface/optimum-habana/pull/834 is merged since it will be added in the same file following similar style.
@srinarayan-srikanthan fix the issue you mentioned, please have a try.
finetune performance | perf | 4xA100 | 4xGaudi2 | 8xGaudi2 | |-------|------|----------| ------- | | time to train | 1hr23min| 35min| 17min | | accuracy | 92.74| 92.22 | 92.57|