diffusers train_text_to_image No Distributed environment, code hangs on TPU v4-8

Describe the bug

I am trying to run train_text_to_image.py on TPU v4-8 but when I start training, I see

02/01/2023 21:12:22 - INFO - train_text_to_image - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cpu

So it doesn't seem to recognize tpu and runs on cpu.

also it hangs right after printing Steps: 0%| | 0/315 [00:00<?, ?it/s]

Reproduction

My steps: ssh to TPU vm

git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .

cd examples/text_to_image/
pip install -U -r requirements.txt

export PATH=$PATH:/home/ssusie/.local/bin

pip install accelerate
accelerate config
---------------------------------------------------------------------------------------------------------------------In which compute environment are you running?
This machine                                                                                                         
---------------------------------------------------------------------------------------------------------------------Which type of machine are you using?                                                                                 
TPU                                                                                                                  
Do you wish to optimize your script with torch dynamo?[yes/NO]:NO                                                    
What is the name of the function in your script that should be launched in all parallel scripts? [main]:             
Are you using a TPU cluster? [yes/NO]: NO                                                                            
How many TPU cores should be used for distributed training? [1]: 4

accelerate launch  train_text_to_image.py   --pretrained_model_name_or_path=$MODEL_NAME   --dataset_name=$dataset_name   --use_ema   --resolution=512 --center_crop --random_flip   --train_batch_size=2   --gradient_accumulation_steps=4   --gradient_checkpointing   --num_train_epochs=3   --learning_rate=1e-05   --max_grad_norm=1   --lr_scheduler="constant" --lr_warmup_steps=0   --output_dir="sd-pokemon-model"

### Logs

```shell
$ accelerate launch  train_text_to_image.py   --pretrained_model_name_or_path=$MODEL_NAME   --dataset_name=$dataset_name   --use_ema   --resolution=512 --center_crop --random_flip   --train_batch_size=2   --gradient_accumulation_steps=4   --gradient_checkpointing   --num_train_epochs=3   --learning_rate=1e-05   --max_grad_norm=1   --lr_scheduler="constant" --lr_warmup_steps=0   --output_dir="sd-pokemon-model"
/home/ssusie/.local/lib/python3.8/site-packages/accelerate/accelerator.py:231: FutureWarning: `logging_dir` is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use `project_dir` instead.
  warnings.warn(
02/01/2023 21:21:46 - INFO - train_text_to_image - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cpu
Mixed precision type: no

/home/ssusie/.local/lib/python3.8/site-packages/accelerate/accelerator.py:231: FutureWarning: `logging_dir` is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use `project_dir` instead.
  warnings.warn(
02/01/2023 21:21:46 - INFO - train_text_to_image - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cpu
Mixed precision type: no

/home/ssusie/.local/lib/python3.8/site-packages/accelerate/accelerator.py:231: FutureWarning: `logging_dir` is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use `project_dir` instead.
  warnings.warn(
02/01/2023 21:21:46 - INFO - train_text_to_image - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cpu
Mixed precision type: no

/home/ssusie/.local/lib/python3.8/site-packages/accelerate/accelerator.py:231: FutureWarning: `logging_dir` is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use `project_dir` instead.
  warnings.warn(
02/01/2023 21:21:46 - INFO - train_text_to_image - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cpu
Mixed precision type: no

{'variance_type', 'prediction_type'} was not found in config. Values will be initialized to default values.
{'prediction_type', 'variance_type'} was not found in config. Values will be initialized to default values.
{'variance_type', 'prediction_type'} was not found in config. Values will be initialized to default values.
{'prediction_type', 'variance_type'} was not found in config. Values will be initialized to default values.
{'norm_num_groups', 'scaling_factor'} was not found in config. Values will be initialized to default values.
{'only_cross_attention', 'use_linear_projection', 'resnet_time_scale_shift', 'mid_block_type', 'upcast_attention', 'num_class_embeds', 'dual_cross_attention', 'class_embed_type'} was not found in config. Values will be initialized to default values.
{'norm_num_groups', 'scaling_factor'} was not found in config. Values will be initialized to default values.
{'norm_num_groups', 'scaling_factor'} was not found in config. Values will be initialized to default values.
{'scaling_factor', 'norm_num_groups'} was not found in config. Values will be initialized to default values.
{'class_embed_type', 'mid_block_type', 'dual_cross_attention', 'num_class_embeds', 'only_cross_attention', 'upcast_attention', 'resnet_time_scale_shift', 'use_linear_projection'} was not found in config. Values will be initialized to default values.
{'upcast_attention', 'use_linear_projection', 'mid_block_type', 'dual_cross_attention', 'resnet_time_scale_shift', 'class_embed_type', 'only_cross_attention', 'num_class_embeds'} was not found in config. Values will be initialized to default values.
{'dual_cross_attention', 'resnet_time_scale_shift', 'only_cross_attention', 'mid_block_type', 'upcast_attention', 'class_embed_type', 'num_class_embeds', 'use_linear_projection'} was not found in config. Values will be initialized to default values.
{'only_cross_attention', 'use_linear_projection', 'resnet_time_scale_shift', 'mid_block_type', 'upcast_attention', 'num_class_embeds', 'dual_cross_attention', 'class_embed_type'} was not found in config. Values will be initialized to default values.
{'class_embed_type', 'mid_block_type', 'dual_cross_attention', 'num_class_embeds', 'only_cross_attention', 'upcast_attention', 'resnet_time_scale_shift', 'use_linear_projection'} was not found in config. Values will be initialized to default values.
{'upcast_attention', 'use_linear_projection', 'mid_block_type', 'dual_cross_attention', 'resnet_time_scale_shift', 'class_embed_type', 'only_cross_attention', 'num_class_embeds'} was not found in config. Values will be initialized to default values.
{'dual_cross_attention', 'resnet_time_scale_shift', 'only_cross_attention', 'mid_block_type', 'upcast_attention', 'class_embed_type', 'num_class_embeds', 'use_linear_projection'} was not found in config. Values will be initialized to default values.
02/01/2023 21:21:52 - WARNING - datasets.builder - Using custom data configuration lambdalabs--pokemon-blip-captions-10e3527a764857bd
02/01/2023 21:21:52 - WARNING - datasets.builder - Found cached dataset parquet (/home/ssusie/.cache/huggingface/datasets/lambdalabs___parquet/lambdalabs--pokemon-blip-captions-10e3527a764857bd/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
100%|█████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 511.50it/s]
02/01/2023 21:21:53 - INFO - train_text_to_image - ***** Running training *****
02/01/2023 21:21:53 - INFO - train_text_to_image -   Num examples = 833
02/01/2023 21:21:53 - INFO - train_text_to_image -   Num Epochs = 3
02/01/2023 21:21:53 - INFO - train_text_to_image -   Instantaneous batch size per device = 2
02/01/2023 21:21:53 - INFO - train_text_to_image -   Total train batch size (w. parallel, distributed & accumulation) = 8
02/01/2023 21:21:53 - INFO - train_text_to_image -   Gradient Accumulation steps = 4
02/01/2023 21:21:53 - INFO - train_text_to_image -   Total optimization steps = 315
Steps:   0%|                                                                                 | 0/315 [00:00<?, ?it/s]02/01/2023 21:21:54 - WARNING - datasets.builder - Using custom data configuration lambdalabs--pokemon-blip-captions-10e3527a764857bd
02/01/2023 21:21:54 - WARNING - datasets.builder - Found cached dataset parquet (/home/ssusie/.cache/huggingface/datasets/lambdalabs___parquet/lambdalabs--pokemon-blip-captions-10e3527a764857bd/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
100%|█████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 411.49it/s]
02/01/2023 21:21:54 - WARNING - datasets.builder - Using custom data configuration lambdalabs--pokemon-blip-captions-10e3527a764857bd
02/01/2023 21:21:54 - WARNING - datasets.builder - Found cached dataset parquet (/home/ssusie/.cache/huggingface/datasets/lambdalabs___parquet/lambdalabs--pokemon-blip-captions-10e3527a764857bd/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
100%|█████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 238.42it/s]
02/01/2023 21:21:54 - WARNING - datasets.builder - Using custom data configuration lambdalabs--pokemon-blip-captions-10e3527a764857bd
02/01/2023 21:21:54 - WARNING - datasets.builder - Found cached dataset parquet (/home/ssusie/.cache/huggingface/datasets/lambdalabs___parquet/lambdalabs--pokemon-blip-captions-10e3527a764857bd/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
100%|█████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 658.14it/s]
02/01/2023 21:21:55 - INFO - train_text_to_image - ***** Running training *****
02/01/2023 21:21:55 - INFO - train_text_to_image -   Num examples = 833
02/01/2023 21:21:55 - INFO - train_text_to_image -   Num Epochs = 3
02/01/2023 21:21:55 - INFO - train_text_to_image -   Instantaneous batch size per device = 2
02/01/2023 21:21:55 - INFO - train_text_to_image -   Total train batch size (w. parallel, distributed & accumulation) = 8
02/01/2023 21:21:55 - INFO - train_text_to_image -   Gradient Accumulation steps = 4
02/01/2023 21:21:55 - INFO - train_text_to_image -   Total optimization steps = 315
Steps:   0%|                                                                                 | 0/315 [00:00<?, ?it/s]02/01/2023 21:21:55 - INFO - train_text_to_image - ***** Running training *****
02/01/2023 21:21:55 - INFO - train_text_to_image -   Num examples = 833
02/01/2023 21:21:55 - INFO - train_text_to_image -   Num Epochs = 3
02/01/2023 21:21:55 - INFO - train_text_to_image -   Instantaneous batch size per device = 2
02/01/2023 21:21:55 - INFO - train_text_to_image -   Total train batch size (w. parallel, distributed & accumulation) = 8
02/01/2023 21:21:55 - INFO - train_text_to_image -   Gradient Accumulation steps = 4
02/01/2023 21:21:55 - INFO - train_text_to_image -   Total optimization steps = 315
Steps:   0%|                                                                                 | 0/315 [00:00<?, ?it/s]02/01/2023 21:21:55 - INFO - train_text_to_image - ***** Running training *****
02/01/2023 21:21:55 - INFO - train_text_to_image -   Num examples = 833
02/01/2023 21:21:55 - INFO - train_text_to_image -   Num Epochs = 3
02/01/2023 21:21:55 - INFO - train_text_to_image -   Instantaneous batch size per device = 2
02/01/2023 21:21:55 - INFO - train_text_to_image -   Total train batch size (w. parallel, distributed & accumulation) = 8
02/01/2023 21:21:55 - INFO - train_text_to_image -   Gradient Accumulation steps = 4
02/01/2023 21:21:55 - INFO - train_text_to_image -   Total optimization steps = 315
Steps:   0%|                                                                                 | 0/315 [00:00<?, ?it/s]tcmalloc: large alloc 1073741824 bytes == 0x1810c4000 @  0x7efe5935f680 0x7efe59380824 0x7efe59380b8a 0x7efe076d132e 0x7efe076bcda2 0x7efe38fa308b 0x7efe38fa3ad1 0x7efe38fa3b24 0x7efe394b51be 0x7efe39cb27aa 0x7efe39a36cd4 0x7efe39c9490f 0x7efe39a732fe 0x7efe510397ae 0x5f3989 0x5f3e1e 0x570af9 0x56939a 0x50aaa0 0x570035 0x56939a 0x5f6a13 0x59ce2f 0x5f35ce 0x56c8cd 0x56939a 0x5f6a13 0x50aa2c 0x5f3547 0x56c8cd 0x56939a
tcmalloc: large alloc 1073741824 bytes == 0x181f08000 @  0x7efc9f636680 0x7efc9f657824 0x7efc9f657b8a 0x7efc4d9a832e 0x7efc4d993da2 0x7efc7f27a08b 0x7efc7f27aad1 0x7efc7f27ab24 0x7efc7f78c1be 0x7efc7ff897aa 0x7efc7fd0dcd4 0x7efc7ff6b90f 0x7efc7fd4a2fe 0x7efc973107ae 0x5f3989 0x5f3e1e 0x570af9 0x56939a 0x50aaa0 0x570035 0x56939a 0x5f6a13 0x59ce2f 0x5f35ce 0x56c8cd 0x56939a 0x5f6a13 0x50aa2c 0x5f3547 0x56c8cd 0x56939a
tcmalloc: large alloc 1073741824 bytes == 0x180848000 @  0x7f789fde5680 0x7f789fe06824 0x7f789fe06b8a 0x7f784e15732e 0x7f784e142da2 0x7f787fa2908b 0x7f787fa29ad1 0x7f787fa29b24 0x7f787ff3b1be 0x7f78807387aa 0x7f78804bccd4 0x7f788071a90f 0x7f78804f92fe 0x7f7897abf7ae 0x5f3989 0x5f3e1e 0x570af9 0x56939a 0x50aaa0 0x570035 0x56939a 0x5f6a13 0x59ce2f 0x5f35ce 0x56c8cd 0x56939a 0x5f6a13 0x50aa2c 0x5f3547 0x56c8cd 0x56939a
tcmalloc: large alloc 1073741824 bytes == 0x1812b2000 @  0x7fd5205ad680 0x7fd5205ce824 0x7fd5205ceb8a 0x7fd4ce91f32e 0x7fd4ce90ada2 0x7fd5001f108b 0x7fd5001f1ad1 0x7fd5001f1b24 0x7fd5007031be 0x7fd500f007aa 0x7fd500c84cd4 0x7fd500ee290f 0x7fd500cc12fe 0x7fd5182877ae 0x5f3989 0x5f3e1e 0x570af9 0x56939a 0x50aaa0 0x570035 0x56939a 0x5f6a13 0x59ce2f 0x5f35ce 0x56c8cd 0x56939a 0x5f6a13 0x50aa2c 0x5f3547 0x56c8cd 0x56939a

System Info

Running on TPU v4-8

diffusers version: 0.13.0.dev0
Platform: Linux-5.13.0-1023-gcp-x86_64-with-glibc2.29
Python version: 3.8.10
PyTorch version (GPU?): 1.12.0+cu102 (False)
Huggingface_hub version: 0.12.0
Transformers version: 4.26.0
Accelerate version: 0.16.0
xFormers version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Feb 01 '23 22:02 ssusie

cc @patil-suraj @williamberman can you take a look here? :-)

Feb 03 '23 17:02 patrickvonplaten

Did you install the xla version of torch. On TPUs we need to install torch_xla, cf https://github.com/pytorch/xla/

Also, we haven't tested the scripts on TPUs so there might some rough edges.

There's a flax script available here which can be used on TPUs.

Feb 07 '23 13:02 patil-suraj

Thanks for the answer @patil-suraj On TPU v4 if you use runtime_version=tpu-vm-v4-pt-1.13 the xla comes preinstalled, so no need to install it separately. I have tested the flax version, but I am interested in the pytorch xla in particular.

Feb 07 '23 18:02 ssusie

Don't have access to TPU V4 at the time, and it's not a priority to support TPUs for PyTorch scripts at the moment. Adding this to my todo list, though. But might take some time. In the meantime if you have a fix feel free to open a PR :)

Feb 08 '23 09:02 patil-suraj

can we try v3 and determine if the API works?

Feb 14 '23 00:02 miladm

@patil-suraj do you have v3 access?

Feb 14 '23 00:02 miladm

I can confirm v3-8 works with flax training code. I did not test pytorch xla on tpu. Highly recommend you use the flax script. Conversion script between flax <-> pytorch model is now working, so better use jax with better tpu support.

Feb 14 '23 13:02 Lime-Cakes

I agree with @Lime-Cakes, it's usually more efficient to use Flax on TPUs rather than PyTorch XLA. @ssusie do you think there are use cases that cannot be supported with the tested Flax API?

Feb 14 '23 13:02 pcuenca

Hey @patrickvonplaten - there are some users of PyTorch on TPUs that are looking for this. We obviously want JAX to be supported but PyTorch is also a priority. Happy to chat more about how we can make TPUs more available for testing and being more support to make tpus first class citizens..

Feb 16 '23 19:02 jspisak

Hey @jspisak,

At the moment, we're simply lacking the time to support setup testing diffusers on TPU, but it'd indeed by nice to have support for this. Does anybody from your team have time by any chance to help setup some TPU testing for PyTorch?

We could maybe start by running all Stable Diffusion Tests (see here) with PyTorchXLA on TPU to see what tests don't pass and whether it's an easy fix? And then it'd indeed by nice to set up a TPU test runner instance.

Mar 06 '23 10:03 patrickvonplaten

@patrickvonplaten - totally get it and understand where you are coming from. Let me chat with my eng team and see what we can do. I also want a CI runner on TPUs for Diffusers.. :) cc @shauheen

Mar 06 '23 15:03 jspisak

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Mar 31 '23 15:03 github-actions[bot]

diffusers diffusers copied to clipboard

train_text_to_image No Distributed environment, code hangs on TPU v4-8

Describe the bug

Reproduction

System Info

diffusers
diffusers copied to clipboard