Wang, Yi comments

Results 69 comments of


                                            Wang, Yi

add low_cpu_mem_usage option in run_clm.py example which will benefit…

@sgugger please help review

add sharded checkpoint loading for AutoTP path to reduce the peak mem…

@delock @yao-matrix

add sharded checkpoint loading for AutoTP path to reduce the peak mem…

Hi，@molly-smith, this PR is meant to reduce the host memory per Rank, support shard loading in AutoTP path, same with shard loading in kernel injection path.

optimize seamless-m4t/vits model for text-to-speech generation

pipeline running issue addressed by https://github.com/huggingface/transformers/pull/29722

optimize seamless-m4t/vits model for text-to-speech generation

| perf | A100 | Gaudi2 | |-------|------|----------| | BF16 | 528ms| 250ms| |FP32 | 410ms| 275ms| in my env python3 run_pipeline.py \ --model_name_or_path facebook/hf-seamless-m4t-medium \ --text "Hello, my dog...

optimize seamless-m4t/vits model for text-to-speech generation

@libinta please help review.

optimize seamless-m4t/vits model for text-to-speech generation

| perf | A100 | Gaudi2 | |-------|------|----------| | BF16 | 87ms| 11ms | | FP32 | 102ms| 13ms| in my env python3 run_pipeline.py \ --model_name_or_path facebook/mms-tts-eng \ --text "Hello,...

optimize seamless-m4t/vits model for text-to-speech generation

CI tests will be added after https://github.com/huggingface/optimum-habana/pull/834 is merged since it will be added in the same file following similar style.

diffuser dreambooth full/lora/lokr/loha/oft finetune, dreambooth XL lora finetune

@srinarayan-srikanthan fix the issue you mentioned, please have a try.

add intel-mila protST example

finetune performance | perf | 4xA100 | 4xGaudi2 | 8xGaudi2 | |-------|------|----------| ------- | | time to train | 1hr23min| 35min| 17min | | accuracy | 92.74| 92.22 | 92.57|