What does this PR do?

Add the multi-cards distributed inference example

Fixes # (issue)

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ ] Did you make sure to update the documentation with your changes?
[ ] Did you write any new necessary tests?

Apr 15 '24 16:04 yuanwu2017

OK. I will modify.

Apr 26 '24 01:04 yuanwu2017

@libinta Done. The example only runs the SD text-to-image inferences with multi-cards, so I didn't add the CI and performance data. If needed, let me know. Thanks.

Apr 26 '24 05:04 yuanwu2017

Test result is ok. result_0_0 result_1_0

Apr 26 '24 05:04 yuanwu2017

@yuanwu2017 what I mean is can you change https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion for multi-card with text_to_image_generation.py to support multi-card?

Apr 30 '24 00:04 libinta

Let me have a try.

Apr 30 '24 02:04 yuanwu2017

@libinta Done.

May 07 '24 03:05 yuanwu2017

Multi-cards inference test result:

- text-to-image generation

one prompt on one card: command:

python text_to_image_generation.py \
    --model_name_or_path runwayml/stable-diffusion-v1-5 \
    --prompts "An image of a squirrel in Picasso style" \
    --num_images_per_prompt 20 \
    --batch_size 4 \
    --image_save_dir /tmp/stable_diffusion_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16

Performance: [INFO|pipeline_stable_diffusion.py:410] 2024-05-09 01:51:01,523 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es). 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [01:44<00:00, 20.97s/it] [INFO|pipeline_stable_diffusion.py:591] 2024-05-09 01:52:46,425 >> Speed metrics: {'generation_runtime': 104.8739, 'generation_samples_per_second': 0.953, 'generation_steps_per_second': 0.595}

two prompts on two cards: command:

python ../gaudi_spawn.py \
    --world_size 2 text_to_image_generation.py \
    --model_name_or_path runwayml/stable-diffusion-v1-5 \
    --prompts "An image of a squirrel in Picasso style" "A shiny flying horse taking off" \
    --num_images_per_prompt 20 \
    --batch_size 4 \
    --image_save_dir /tmp/stable_diffusion_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16 \
    --distributed

Performance: [INFO|pipeline_stable_diffusion.py:410] 2024-05-09 01:56:24,400 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es). 0%| | 0/5 [00:00<?, ?it/s][INFO|pipeline_stable_diffusion.py:410] 2024-05-09 01:56:25,149 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es). 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [01:42<00:00, 20.58s/it] [INFO|pipeline_stable_diffusion.py:591] 2024-05-09 01:58:07,324 >> Speed metrics: {'generation_runtime': 102.8982, 'generation_samples_per_second': 0.951, 'generation_steps_per_second': 0.594} 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [01:42<00:00, 20.49s/it] [INFO|pipeline_stable_diffusion.py:591] 2024-05-09 01:58:07,628 >> Speed metrics: {'generation_runtime': 102.4432, 'generation_samples_per_second': 0.956, 'generation_steps_per_second': 0.598}

There is no performance regression for multi-cards.

May 09 '24 02:05 yuanwu2017

@regisss @libinta Please help to review. There is no example test for diffusers model. Should I add the tests of diffusers example? Or rely on the unit tests of test_diffusers.py without adding additional tests of example.

May 09 '24 02:05 yuanwu2017

- stable_diffusion_ldm3d

one prompt on one card: command:

python text_to_image_generation.py \
    --model_name_or_path "Intel/ldm3d-4c" \
    --prompts "An image of a squirrel in Picasso style" \
    --num_images_per_prompt 10 \
    --batch_size 2 \
    --height 768 \
    --width 768 \
    --image_save_dir /tmp/stable_diffusion_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion-2 \
    --ldm3d

Performance: [INFO|pipeline_stable_diffusion_ldm3d.py:268] 2024-05-09 03:00:32,792 >> 1 prompt(s) received, 10 generation(s) per prompt, 2 sample(s) per batch, 5 total batch(es). 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [03:06<00:00, 37.38s/it] [INFO|pipeline_stable_diffusion_ldm3d.py:411] 2024-05-09 03:03:39,735 >> Speed metrics: {'generation_runtime': 186.9058, 'generation_samples_per_second': 0.28, 'generation_steps_per_second': 0.35}

two prompts on two cards: Command:

python ../gaudi_spawn.py \
    --world_size 2 text_to_image_generation.py \
    --model_name_or_path "Intel/ldm3d-4c" \
    --prompts "An image of a squirrel in Picasso style" "A shiny flying horse taking off" \
    --num_images_per_prompt 10 \
    --batch_size 2 \
    --height 768 \
    --width 768 \
    --image_save_dir /tmp/stable_diffusion_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion-2 \
    --ldm3d \
    --distributed

Performance: [INFO|pipeline_stable_diffusion_ldm3d.py:268] 2024-05-09 03:09:12,892 >> 1 prompt(s) received, 10 generation(s) per prompt, 2 sample(s) per batch, 5 total batch(es). 0%| | 0/5 [00:00<?, ?it/s][INFO|pipeline_stable_diffusion_ldm3d.py:268] 2024-05-09 03:09:13,774 >> 1 prompt(s) received, 10 generation(s) per prompt, 2 sample(s) per batch, 5 total batch(es). 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [03:08<00:00, 37.64s/it] [INFO|pipeline_stable_diffusion_ldm3d.py:411] 2024-05-09 03:12:21,116 >> Speed metrics: {'generation_runtime': 188.1996, 'generation_samples_per_second': 0.28, 'generation_steps_per_second': 0.35} 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [03:09<00:00, 37.82s/it] [INFO|pipeline_stable_diffusion_ldm3d.py:411] 2024-05-09 03:12:22,874 >> Speed metrics: {'generation_runtime': 189.0768, 'generation_samples_per_second': 0.281, 'generation_steps_per_second': 0.351}

There is no performance regression for multi-cards.

May 09 '24 03:05 yuanwu2017

- Stable Diffusion XL

one prompt on one card: command: python text_to_image_generation.py --model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 --prompts "Sailing ship painting by Van Gogh" --prompts_2 "Red tone" --negative_prompts "Low quality" --negative_prompts_2 "Clouds" --num_images_per_prompt 20 --batch_size 4 --image_save_dir /tmp/stable_diffusion_xl_images --scheduler euler_discrete --use_habana --use_hpu_graphs --gaudi_config Habana/stable-diffusion --bf16 Performance: [INFO|pipeline_stable_diffusion_xl.py:537] 2024-05-09 03:45:54,416 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es). 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [04:32<00:00, 54.45s/it] [INFO|pipeline_stable_diffusion_xl.py:810] 2024-05-09 03:50:26,751 >> Speed metrics: {'generation_runtime': 272.2497, 'generation_samples_per_second': 0.196, 'generation_steps_per_second': 0.123}
two prompts on two cards command:

python ../gaudi_spawn.py \
    --world_size 2 text_to_image_generation.py \
    --model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
    --prompts "Sailing ship painting by Van Gogh" "A shiny flying horse taking off" \
    --prompts_2 "Red tone" "Blue tone" \
    --negative_prompts "Low quality" "Sketch" \
    --negative_prompts_2 "Clouds" "Clouds" \
    --num_images_per_prompt 20 \
    --batch_size 4 \
    --image_save_dir /tmp/stable_diffusion_xl_images \
    --scheduler euler_discrete \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16 \
    --distributed

Performance: [INFO|pipeline_stable_diffusion_xl.py:537] 2024-05-09 04:21:46,940 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es). 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [04:30<00:00, 54.19s/it] [INFO|pipeline_stable_diffusion_xl.py:810] 2024-05-09 04:26:11,451 >> Speed metrics: {'generation_runtime': 270.9386, 'generation_samples_per_second': 0.196, 'generation_steps_per_second': 0.123} 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [04:32<00:00, 54.53s/it] [INFO|pipeline_stable_diffusion_xl.py:810] 2024-05-09 04:26:19,669 >> Speed metrics: {'generation_runtime': 272.639, 'generation_samples_per_second': 0.196, 'generation_steps_per_second': 0.123}

May 09 '24 06:05 yuanwu2017

- ControlNet

one prompt on one card command:

python text_to_image_generation.py \
    --model_name_or_path runwayml/stable-diffusion-v1-5 \
    --controlnet_model_name_or_path lllyasviel/sd-controlnet-canny \
    --prompts "futuristic-looking woman" "a rusty robot" \
    --control_image https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png \
    --num_images_per_prompt 10 \
    --batch_size 4 \
    --image_save_dir /tmp/controlnet_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16

Performance:

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [02:19<00:00, 27.87s/it] [INFO|pipeline_controlnet.py:625] 2024-05-09 06:10:51,730 >> Speed metrics: {'generation_runtime': 139.3345, 'generation_samples_per_second': 0.683, 'generation_steps_per_second': 0.427} 2. two prompts on two cards: command:

python ../gaudi_spawn.py \
    --world_size 2 text_to_image_generation.py \
    --model_name_or_path runwayml/stable-diffusion-v1-5 \
    --controlnet_model_name_or_path lllyasviel/sd-controlnet-canny \
    --prompts "futuristic-looking woman" "a rusty robot" \
    --control_image https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png \
    --num_images_per_prompt 10 \
    --batch_size 4 \
    --image_save_dir /tmp/controlnet_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16 \
    --distributed

Performance: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [02:14<00:00, 26.98s/it] [INFO|pipeline_controlnet.py:625] 2024-05-09 06:26:41,088 >> Speed metrics: {'generation_runtime': 134.8915, 'generation_samples_per_second': 0.674, 'generation_steps_per_second': 0.421} 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [02:17<00:00, 27.53s/it] [INFO|pipeline_controlnet.py:625] 2024-05-09 06:26:45,986 >> Speed metrics: {'generation_runtime': 137.6633, 'generation_samples_per_second': 0.675, 'generation_steps_per_second': 0.422}

May 09 '24 06:05 yuanwu2017

@libinta @regisss Please help to review and merge the patch.

Jun 25 '24 16:06 yuanwu2017

@libinta and @regisss , could you help on final review and merge.

Jul 04 '24 03:07 yao-matrix

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Jul 10 '24 20:07 HuggingFaceDocBuilderDev

optimum-habana
optimum-habana copied to clipboard

Add the MC example

What does this PR do?

Before submitting

optimum-habana optimum-habana copied to clipboard

Add the MC example

What does this PR do?

Before submitting

optimum-habana
optimum-habana copied to clipboard