optimum-habana icon indicating copy to clipboard operation
optimum-habana copied to clipboard

Add the MC example

Open yuanwu2017 opened this issue 10 months ago • 11 comments

What does this PR do?

Add the multi-cards distributed inference example

Fixes # (issue)

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [ ] Did you make sure to update the documentation with your changes?
  • [ ] Did you write any new necessary tests?

yuanwu2017 avatar Apr 15 '24 16:04 yuanwu2017

OK. I will modify.

yuanwu2017 avatar Apr 26 '24 01:04 yuanwu2017

@libinta Done. The example only runs the SD text-to-image inferences with multi-cards, so I didn't add the CI and performance data. If needed, let me know. Thanks.

yuanwu2017 avatar Apr 26 '24 05:04 yuanwu2017

Test result is ok. result_0_0 result_1_0

yuanwu2017 avatar Apr 26 '24 05:04 yuanwu2017

@yuanwu2017 what I mean is can you change https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion for multi-card with text_to_image_generation.py to support multi-card?

libinta avatar Apr 30 '24 00:04 libinta

Let me have a try.

yuanwu2017 avatar Apr 30 '24 02:04 yuanwu2017

@libinta Done.

yuanwu2017 avatar May 07 '24 03:05 yuanwu2017

Multi-cards inference test result:

- text-to-image generation

  1. one prompt on one card: command:
python text_to_image_generation.py \
    --model_name_or_path runwayml/stable-diffusion-v1-5 \
    --prompts "An image of a squirrel in Picasso style" \
    --num_images_per_prompt 20 \
    --batch_size 4 \
    --image_save_dir /tmp/stable_diffusion_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16

Performance: [INFO|pipeline_stable_diffusion.py:410] 2024-05-09 01:51:01,523 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es). 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [01:44<00:00, 20.97s/it] [INFO|pipeline_stable_diffusion.py:591] 2024-05-09 01:52:46,425 >> Speed metrics: {'generation_runtime': 104.8739, 'generation_samples_per_second': 0.953, 'generation_steps_per_second': 0.595}

  1. two prompts on two cards: command:
python ../gaudi_spawn.py \
    --world_size 2 text_to_image_generation.py \
    --model_name_or_path runwayml/stable-diffusion-v1-5 \
    --prompts "An image of a squirrel in Picasso style" "A shiny flying horse taking off" \
    --num_images_per_prompt 20 \
    --batch_size 4 \
    --image_save_dir /tmp/stable_diffusion_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16 \
    --distributed

Performance: [INFO|pipeline_stable_diffusion.py:410] 2024-05-09 01:56:24,400 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es). 0%| | 0/5 [00:00<?, ?it/s][INFO|pipeline_stable_diffusion.py:410] 2024-05-09 01:56:25,149 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es). 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [01:42<00:00, 20.58s/it] [INFO|pipeline_stable_diffusion.py:591] 2024-05-09 01:58:07,324 >> Speed metrics: {'generation_runtime': 102.8982, 'generation_samples_per_second': 0.951, 'generation_steps_per_second': 0.594} 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [01:42<00:00, 20.49s/it] [INFO|pipeline_stable_diffusion.py:591] 2024-05-09 01:58:07,628 >> Speed metrics: {'generation_runtime': 102.4432, 'generation_samples_per_second': 0.956, 'generation_steps_per_second': 0.598}

There is no performance regression for multi-cards.

yuanwu2017 avatar May 09 '24 02:05 yuanwu2017

@regisss @libinta Please help to review. There is no example test for diffusers model. Should I add the tests of diffusers example? Or rely on the unit tests of test_diffusers.py without adding additional tests of example.

yuanwu2017 avatar May 09 '24 02:05 yuanwu2017

- stable_diffusion_ldm3d

  1. one prompt on one card: command:
python text_to_image_generation.py \
    --model_name_or_path "Intel/ldm3d-4c" \
    --prompts "An image of a squirrel in Picasso style" \
    --num_images_per_prompt 10 \
    --batch_size 2 \
    --height 768 \
    --width 768 \
    --image_save_dir /tmp/stable_diffusion_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion-2 \
    --ldm3d

Performance: [INFO|pipeline_stable_diffusion_ldm3d.py:268] 2024-05-09 03:00:32,792 >> 1 prompt(s) received, 10 generation(s) per prompt, 2 sample(s) per batch, 5 total batch(es). 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [03:06<00:00, 37.38s/it] [INFO|pipeline_stable_diffusion_ldm3d.py:411] 2024-05-09 03:03:39,735 >> Speed metrics: {'generation_runtime': 186.9058, 'generation_samples_per_second': 0.28, 'generation_steps_per_second': 0.35}

  1. two prompts on two cards: Command:
python ../gaudi_spawn.py \
    --world_size 2 text_to_image_generation.py \
    --model_name_or_path "Intel/ldm3d-4c" \
    --prompts "An image of a squirrel in Picasso style" "A shiny flying horse taking off" \
    --num_images_per_prompt 10 \
    --batch_size 2 \
    --height 768 \
    --width 768 \
    --image_save_dir /tmp/stable_diffusion_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion-2 \
    --ldm3d \
    --distributed

Performance: [INFO|pipeline_stable_diffusion_ldm3d.py:268] 2024-05-09 03:09:12,892 >> 1 prompt(s) received, 10 generation(s) per prompt, 2 sample(s) per batch, 5 total batch(es). 0%| | 0/5 [00:00<?, ?it/s][INFO|pipeline_stable_diffusion_ldm3d.py:268] 2024-05-09 03:09:13,774 >> 1 prompt(s) received, 10 generation(s) per prompt, 2 sample(s) per batch, 5 total batch(es). 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [03:08<00:00, 37.64s/it] [INFO|pipeline_stable_diffusion_ldm3d.py:411] 2024-05-09 03:12:21,116 >> Speed metrics: {'generation_runtime': 188.1996, 'generation_samples_per_second': 0.28, 'generation_steps_per_second': 0.35} 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [03:09<00:00, 37.82s/it] [INFO|pipeline_stable_diffusion_ldm3d.py:411] 2024-05-09 03:12:22,874 >> Speed metrics: {'generation_runtime': 189.0768, 'generation_samples_per_second': 0.281, 'generation_steps_per_second': 0.351}

There is no performance regression for multi-cards.

yuanwu2017 avatar May 09 '24 03:05 yuanwu2017

- Stable Diffusion XL

  1. one prompt on one card: command: python text_to_image_generation.py --model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 --prompts "Sailing ship painting by Van Gogh" --prompts_2 "Red tone" --negative_prompts "Low quality" --negative_prompts_2 "Clouds" --num_images_per_prompt 20 --batch_size 4 --image_save_dir /tmp/stable_diffusion_xl_images --scheduler euler_discrete --use_habana --use_hpu_graphs --gaudi_config Habana/stable-diffusion --bf16 Performance: [INFO|pipeline_stable_diffusion_xl.py:537] 2024-05-09 03:45:54,416 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es). 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [04:32<00:00, 54.45s/it] [INFO|pipeline_stable_diffusion_xl.py:810] 2024-05-09 03:50:26,751 >> Speed metrics: {'generation_runtime': 272.2497, 'generation_samples_per_second': 0.196, 'generation_steps_per_second': 0.123}

  2. two prompts on two cards command:

python ../gaudi_spawn.py \
    --world_size 2 text_to_image_generation.py \
    --model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
    --prompts "Sailing ship painting by Van Gogh" "A shiny flying horse taking off" \
    --prompts_2 "Red tone" "Blue tone" \
    --negative_prompts "Low quality" "Sketch" \
    --negative_prompts_2 "Clouds" "Clouds" \
    --num_images_per_prompt 20 \
    --batch_size 4 \
    --image_save_dir /tmp/stable_diffusion_xl_images \
    --scheduler euler_discrete \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16 \
    --distributed

Performance: [INFO|pipeline_stable_diffusion_xl.py:537] 2024-05-09 04:21:46,940 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es). 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [04:30<00:00, 54.19s/it] [INFO|pipeline_stable_diffusion_xl.py:810] 2024-05-09 04:26:11,451 >> Speed metrics: {'generation_runtime': 270.9386, 'generation_samples_per_second': 0.196, 'generation_steps_per_second': 0.123} 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [04:32<00:00, 54.53s/it] [INFO|pipeline_stable_diffusion_xl.py:810] 2024-05-09 04:26:19,669 >> Speed metrics: {'generation_runtime': 272.639, 'generation_samples_per_second': 0.196, 'generation_steps_per_second': 0.123}

yuanwu2017 avatar May 09 '24 06:05 yuanwu2017

- ControlNet

  1. one prompt on one card command:
python text_to_image_generation.py \
    --model_name_or_path runwayml/stable-diffusion-v1-5 \
    --controlnet_model_name_or_path lllyasviel/sd-controlnet-canny \
    --prompts "futuristic-looking woman" "a rusty robot" \
    --control_image https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png \
    --num_images_per_prompt 10 \
    --batch_size 4 \
    --image_save_dir /tmp/controlnet_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16

Performance:

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [02:19<00:00, 27.87s/it] [INFO|pipeline_controlnet.py:625] 2024-05-09 06:10:51,730 >> Speed metrics: {'generation_runtime': 139.3345, 'generation_samples_per_second': 0.683, 'generation_steps_per_second': 0.427} 2. two prompts on two cards: command:

python ../gaudi_spawn.py \
    --world_size 2 text_to_image_generation.py \
    --model_name_or_path runwayml/stable-diffusion-v1-5 \
    --controlnet_model_name_or_path lllyasviel/sd-controlnet-canny \
    --prompts "futuristic-looking woman" "a rusty robot" \
    --control_image https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png \
    --num_images_per_prompt 10 \
    --batch_size 4 \
    --image_save_dir /tmp/controlnet_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16 \
    --distributed

Performance: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [02:14<00:00, 26.98s/it] [INFO|pipeline_controlnet.py:625] 2024-05-09 06:26:41,088 >> Speed metrics: {'generation_runtime': 134.8915, 'generation_samples_per_second': 0.674, 'generation_steps_per_second': 0.421} 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [02:17<00:00, 27.53s/it] [INFO|pipeline_controlnet.py:625] 2024-05-09 06:26:45,986 >> Speed metrics: {'generation_runtime': 137.6633, 'generation_samples_per_second': 0.675, 'generation_steps_per_second': 0.422}

yuanwu2017 avatar May 09 '24 06:05 yuanwu2017

@libinta @regisss Please help to review and merge the patch.

yuanwu2017 avatar Jun 25 '24 16:06 yuanwu2017

@libinta and @regisss , could you help on final review and merge.

yao-matrix avatar Jul 04 '24 03:07 yao-matrix

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.