optimum-habana
optimum-habana copied to clipboard
Add the MC example
What does this PR do?
Add the multi-cards distributed inference example
Fixes # (issue)
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you make sure to update the documentation with your changes?
- [ ] Did you write any new necessary tests?
OK. I will modify.
@libinta Done. The example only runs the SD text-to-image inferences with multi-cards, so I didn't add the CI and performance data. If needed, let me know. Thanks.
Test result is ok.
@yuanwu2017 what I mean is can you change https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion for multi-card with text_to_image_generation.py to support multi-card?
Let me have a try.
@libinta Done.
Multi-cards inference test result:
- text-to-image generation
- one prompt on one card: command:
python text_to_image_generation.py \
--model_name_or_path runwayml/stable-diffusion-v1-5 \
--prompts "An image of a squirrel in Picasso style" \
--num_images_per_prompt 20 \
--batch_size 4 \
--image_save_dir /tmp/stable_diffusion_images \
--use_habana \
--use_hpu_graphs \
--gaudi_config Habana/stable-diffusion \
--bf16
Performance: [INFO|pipeline_stable_diffusion.py:410] 2024-05-09 01:51:01,523 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es). 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [01:44<00:00, 20.97s/it] [INFO|pipeline_stable_diffusion.py:591] 2024-05-09 01:52:46,425 >> Speed metrics: {'generation_runtime': 104.8739, 'generation_samples_per_second': 0.953, 'generation_steps_per_second': 0.595}
- two prompts on two cards: command:
python ../gaudi_spawn.py \
--world_size 2 text_to_image_generation.py \
--model_name_or_path runwayml/stable-diffusion-v1-5 \
--prompts "An image of a squirrel in Picasso style" "A shiny flying horse taking off" \
--num_images_per_prompt 20 \
--batch_size 4 \
--image_save_dir /tmp/stable_diffusion_images \
--use_habana \
--use_hpu_graphs \
--gaudi_config Habana/stable-diffusion \
--bf16 \
--distributed
Performance: [INFO|pipeline_stable_diffusion.py:410] 2024-05-09 01:56:24,400 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es). 0%| | 0/5 [00:00<?, ?it/s][INFO|pipeline_stable_diffusion.py:410] 2024-05-09 01:56:25,149 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es). 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [01:42<00:00, 20.58s/it] [INFO|pipeline_stable_diffusion.py:591] 2024-05-09 01:58:07,324 >> Speed metrics: {'generation_runtime': 102.8982, 'generation_samples_per_second': 0.951, 'generation_steps_per_second': 0.594} 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [01:42<00:00, 20.49s/it] [INFO|pipeline_stable_diffusion.py:591] 2024-05-09 01:58:07,628 >> Speed metrics: {'generation_runtime': 102.4432, 'generation_samples_per_second': 0.956, 'generation_steps_per_second': 0.598}
There is no performance regression for multi-cards.
@regisss @libinta Please help to review. There is no example test for diffusers model. Should I add the tests of diffusers example? Or rely on the unit tests of test_diffusers.py without adding additional tests of example.
- stable_diffusion_ldm3d
- one prompt on one card: command:
python text_to_image_generation.py \
--model_name_or_path "Intel/ldm3d-4c" \
--prompts "An image of a squirrel in Picasso style" \
--num_images_per_prompt 10 \
--batch_size 2 \
--height 768 \
--width 768 \
--image_save_dir /tmp/stable_diffusion_images \
--use_habana \
--use_hpu_graphs \
--gaudi_config Habana/stable-diffusion-2 \
--ldm3d
Performance: [INFO|pipeline_stable_diffusion_ldm3d.py:268] 2024-05-09 03:00:32,792 >> 1 prompt(s) received, 10 generation(s) per prompt, 2 sample(s) per batch, 5 total batch(es). 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [03:06<00:00, 37.38s/it] [INFO|pipeline_stable_diffusion_ldm3d.py:411] 2024-05-09 03:03:39,735 >> Speed metrics: {'generation_runtime': 186.9058, 'generation_samples_per_second': 0.28, 'generation_steps_per_second': 0.35}
- two prompts on two cards: Command:
python ../gaudi_spawn.py \
--world_size 2 text_to_image_generation.py \
--model_name_or_path "Intel/ldm3d-4c" \
--prompts "An image of a squirrel in Picasso style" "A shiny flying horse taking off" \
--num_images_per_prompt 10 \
--batch_size 2 \
--height 768 \
--width 768 \
--image_save_dir /tmp/stable_diffusion_images \
--use_habana \
--use_hpu_graphs \
--gaudi_config Habana/stable-diffusion-2 \
--ldm3d \
--distributed
Performance: [INFO|pipeline_stable_diffusion_ldm3d.py:268] 2024-05-09 03:09:12,892 >> 1 prompt(s) received, 10 generation(s) per prompt, 2 sample(s) per batch, 5 total batch(es). 0%| | 0/5 [00:00<?, ?it/s][INFO|pipeline_stable_diffusion_ldm3d.py:268] 2024-05-09 03:09:13,774 >> 1 prompt(s) received, 10 generation(s) per prompt, 2 sample(s) per batch, 5 total batch(es). 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [03:08<00:00, 37.64s/it] [INFO|pipeline_stable_diffusion_ldm3d.py:411] 2024-05-09 03:12:21,116 >> Speed metrics: {'generation_runtime': 188.1996, 'generation_samples_per_second': 0.28, 'generation_steps_per_second': 0.35} 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [03:09<00:00, 37.82s/it] [INFO|pipeline_stable_diffusion_ldm3d.py:411] 2024-05-09 03:12:22,874 >> Speed metrics: {'generation_runtime': 189.0768, 'generation_samples_per_second': 0.281, 'generation_steps_per_second': 0.351}
There is no performance regression for multi-cards.
- Stable Diffusion XL
-
one prompt on one card: command:
python text_to_image_generation.py --model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 --prompts "Sailing ship painting by Van Gogh" --prompts_2 "Red tone" --negative_prompts "Low quality" --negative_prompts_2 "Clouds" --num_images_per_prompt 20 --batch_size 4 --image_save_dir /tmp/stable_diffusion_xl_images --scheduler euler_discrete --use_habana --use_hpu_graphs --gaudi_config Habana/stable-diffusion --bf16
Performance: [INFO|pipeline_stable_diffusion_xl.py:537] 2024-05-09 03:45:54,416 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es). 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [04:32<00:00, 54.45s/it] [INFO|pipeline_stable_diffusion_xl.py:810] 2024-05-09 03:50:26,751 >> Speed metrics: {'generation_runtime': 272.2497, 'generation_samples_per_second': 0.196, 'generation_steps_per_second': 0.123} -
two prompts on two cards command:
python ../gaudi_spawn.py \
--world_size 2 text_to_image_generation.py \
--model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
--prompts "Sailing ship painting by Van Gogh" "A shiny flying horse taking off" \
--prompts_2 "Red tone" "Blue tone" \
--negative_prompts "Low quality" "Sketch" \
--negative_prompts_2 "Clouds" "Clouds" \
--num_images_per_prompt 20 \
--batch_size 4 \
--image_save_dir /tmp/stable_diffusion_xl_images \
--scheduler euler_discrete \
--use_habana \
--use_hpu_graphs \
--gaudi_config Habana/stable-diffusion \
--bf16 \
--distributed
Performance: [INFO|pipeline_stable_diffusion_xl.py:537] 2024-05-09 04:21:46,940 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es). 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [04:30<00:00, 54.19s/it] [INFO|pipeline_stable_diffusion_xl.py:810] 2024-05-09 04:26:11,451 >> Speed metrics: {'generation_runtime': 270.9386, 'generation_samples_per_second': 0.196, 'generation_steps_per_second': 0.123} 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [04:32<00:00, 54.53s/it] [INFO|pipeline_stable_diffusion_xl.py:810] 2024-05-09 04:26:19,669 >> Speed metrics: {'generation_runtime': 272.639, 'generation_samples_per_second': 0.196, 'generation_steps_per_second': 0.123}
- ControlNet
- one prompt on one card command:
python text_to_image_generation.py \
--model_name_or_path runwayml/stable-diffusion-v1-5 \
--controlnet_model_name_or_path lllyasviel/sd-controlnet-canny \
--prompts "futuristic-looking woman" "a rusty robot" \
--control_image https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png \
--num_images_per_prompt 10 \
--batch_size 4 \
--image_save_dir /tmp/controlnet_images \
--use_habana \
--use_hpu_graphs \
--gaudi_config Habana/stable-diffusion \
--bf16
Performance:
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [02:19<00:00, 27.87s/it] [INFO|pipeline_controlnet.py:625] 2024-05-09 06:10:51,730 >> Speed metrics: {'generation_runtime': 139.3345, 'generation_samples_per_second': 0.683, 'generation_steps_per_second': 0.427} 2. two prompts on two cards: command:
python ../gaudi_spawn.py \
--world_size 2 text_to_image_generation.py \
--model_name_or_path runwayml/stable-diffusion-v1-5 \
--controlnet_model_name_or_path lllyasviel/sd-controlnet-canny \
--prompts "futuristic-looking woman" "a rusty robot" \
--control_image https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png \
--num_images_per_prompt 10 \
--batch_size 4 \
--image_save_dir /tmp/controlnet_images \
--use_habana \
--use_hpu_graphs \
--gaudi_config Habana/stable-diffusion \
--bf16 \
--distributed
Performance: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [02:14<00:00, 26.98s/it] [INFO|pipeline_controlnet.py:625] 2024-05-09 06:26:41,088 >> Speed metrics: {'generation_runtime': 134.8915, 'generation_samples_per_second': 0.674, 'generation_steps_per_second': 0.421} 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [02:17<00:00, 27.53s/it] [INFO|pipeline_controlnet.py:625] 2024-05-09 06:26:45,986 >> Speed metrics: {'generation_runtime': 137.6633, 'generation_samples_per_second': 0.675, 'generation_steps_per_second': 0.422}
@libinta @regisss Please help to review and merge the patch.
@libinta and @regisss , could you help on final review and merge.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.