yongjer comments

Results 9 comments of


                                            yongjer

```RuntimeError: Bad StatusOr access: UNKNOWN: TPU initialization failed: Invalid --2a886c8_slice_builder_worker_addresses specified. Expected 4 worker addresses, got 1.``` when using kaggle tpu

But isn't that accelerate can only use with no-trainer version script? Or I misunderstood ?

```RuntimeError: Bad StatusOr access: UNKNOWN: TPU initialization failed: Invalid --2a886c8_slice_builder_worker_addresses specified. Expected 4 worker addresses, got 1.``` when using kaggle tpu

and here is the error: ``` WARNING:accelerate.commands.launch:The following values were not passed to `accelerate launch` and had defaults used instead: `--num_machines` was set to a value of `1` `--mixed_precision` was...

```RuntimeError: Bad StatusOr access: UNKNOWN: TPU initialization failed: Invalid --2a886c8_slice_builder_worker_addresses specified. Expected 4 worker addresses, got 1.``` when using kaggle tpu

``` WARNING:accelerate.commands.launch:The following values were not passed to `accelerate launch` and had defaults used instead: `--num_machines` was set to a value of `1` `--mixed_precision` was set to a value of...

```RuntimeError: Bad StatusOr access: UNKNOWN: TPU initialization failed: Invalid --2a886c8_slice_builder_worker_addresses specified. Expected 4 worker addresses, got 1.``` when using kaggle tpu

thanks for your help, wish you happy holidays

[2023-12-04 11:52:08,378] [INFO] [autotuner.py:1110:run_after_tuning] No optimal DeepSpeed configuration found by autotuning.

sorry, I'm not sure what you mean here is already the whole log of output

[2023-12-04 11:52:08,378] [INFO] [autotuner.py:1110:run_after_tuning] No optimal DeepSpeed configuration found by autotuning.

Unfortunately, there is no other trace. It leaves the whole line blank as above

[2023-12-04 11:52:08,378] [INFO] [autotuner.py:1110:run_after_tuning] No optimal DeepSpeed configuration found by autotuning.

it does look like cut off ``` hf@8913c96d24e3:/workspaces/hf$ deepspeed --autotuning run ./script/run_classification.py --model_name_or_path ckip-joint/bloom-1b1-zh --do_train --do_eval --output_dir ./bloom --train_file ./data/train.csv --validation_file ./data/test.csv --text_column_names sentence --label_column_name label --overwrite_output_dir --fp16 --torch_compile --deepspeed...

[2023-12-04 11:52:08,378] [INFO] [autotuner.py:1110:run_after_tuning] No optimal DeepSpeed configuration found by autotuning.

btw, here is my full dockerfile: ``` FROM huggingface/transformers-pytorch-deepspeed-latest-gpu:latest RUN apt-get update && apt-get install -y pdsh RUN pip install --upgrade pip bitsandbytes deepspeed[autotuning] # non-root user ARG USERNAME=hf ARG...

[2023-12-04 11:52:08,378] [INFO] [autotuner.py:1110:run_after_tuning] No optimal DeepSpeed configuration found by autotuning.

# I'm not sure whether these help ``` hf@ffc9973e2c76:/workspaces/hf$ tree . ├── DockerFile.hf ├── autotuning_exps │ └── profile_model_info.json ├── autotuning_results │ └── profile_model_info │ ├── cmd.txt │ ├── ds_config.json │...