Wang, Yi

Results 69 comments of Wang, Yi

Hi,@molly-smith, this PR is meant to reduce the host memory per Rank, support shard loading in AutoTP path, same with shard loading in kernel injection path.

pipeline running issue addressed by https://github.com/huggingface/transformers/pull/29722

| perf | A100 | Gaudi2 | |-------|------|----------| | BF16 | 528ms| 250ms| |FP32 | 410ms| 275ms| in my env python3 run_pipeline.py \ --model_name_or_path facebook/hf-seamless-m4t-medium \ --text "Hello, my dog...

| perf | A100 | Gaudi2 | |-------|------|----------| | BF16 | 87ms| 11ms | | FP32 | 102ms| 13ms| in my env python3 run_pipeline.py \ --model_name_or_path facebook/mms-tts-eng \ --text "Hello,...

CI tests will be added after https://github.com/huggingface/optimum-habana/pull/834 is merged since it will be added in the same file following similar style.

@srinarayan-srikanthan fix the issue you mentioned, please have a try.

finetune performance | perf | 4xA100 | 4xGaudi2 | 8xGaudi2 | |-------|------|----------| ------- | | time to train | 1hr23min| 35min| 17min | | accuracy | 92.74| 92.22 | 92.57|