System Info

docker image: huggingface/transformers-pytorch-deepspeed-latest-gpu:latest

transformers version: 4.36.0.dev0
Platform: Linux-6.2.0-37-generic-x86_64-with-glibc2.29
Python version: 3.8.10
Huggingface_hub version: 0.19.4
Safetensors version: 0.4.1
Accelerate version: 0.25.0.dev0
Accelerate config: not found
PyTorch version (GPU?): 2.1.0+cu118 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: rtx4060ti 16g
Using distributed or parallel set-up in script?: no

Who can help?

No response

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

run:

deepspeed  --autotuning run \
./script/run_classification.py \
--model_name_or_path ckip-joint/bloom-1b1-zh \
--do_train \
--do_eval \
--output_dir ./bloom \
--train_file ./data/train.csv \
--validation_file ./data/test.csv \
--text_column_names sentence \
--label_column_name label \
--overwrite_output_dir \
--fp16 \
--torch_compile \
--deepspeed cfg/auto.json

cfg/auto.json:

{
    "train_micro_batch_size_per_gpu": "auto",
    "autotuning": {
      "enabled": true,
      "fast": false
    }
}

the error:

[2023-12-04 11:51:42,325] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-04 11:51:43,363] [WARNING] [runner.py:203:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-12-04 11:51:43,363] [INFO] [autotuner.py:71:__init__] Created autotuning experiments directory: autotuning_exps
[2023-12-04 11:51:43,364] [INFO] [autotuner.py:84:__init__] Created autotuning results directory: autotuning_exps
[2023-12-04 11:51:43,364] [INFO] [autotuner.py:200:_get_resource_manager] active_resources = OrderedDict([('localhost', [0])])
[2023-12-04 11:51:43,364] [INFO] [runner.py:362:run_autotuning] [Start] Running autotuning
[2023-12-04 11:51:43,364] [INFO] [autotuner.py:669:model_info_profile_run] Starting model info profile run.
  0%|                                                                                                                                             | 0/1 [00:00<?, ?it/s][2023-12-04 11:51:43,366] [INFO] [scheduler.py:344:run_experiment] Scheduler wrote ds_config to autotuning_results/profile_model_info/ds_config.json, /workspaces/hf/autotuning_results/profile_model_info/ds_config.json
[2023-12-04 11:51:43,367] [INFO] [scheduler.py:351:run_experiment] Scheduler wrote exp to autotuning_results/profile_model_info/exp.json, /workspaces/hf/autotuning_results/profile_model_info/exp.json
[2023-12-04 11:51:43,367] [INFO] [scheduler.py:378:run_experiment] Launching exp_id = 0, exp_name = profile_model_info, with resource = localhost:0, and ds_config = /workspaces/hf/autotuning_results/profile_model_info/ds_config.json
localhost: ssh: connect to host localhost port 22: Cannot assign requested address
pdsh@b97c1584d47d: localhost: ssh exited with exit code 255
[2023-12-04 11:51:59,057] [INFO] [scheduler.py:430:clean_up] Done cleaning up exp_id = 0 on the following workers: localhost
[2023-12-04 11:51:59,057] [INFO] [scheduler.py:393:run_experiment] Done running exp_id = 0, exp_name = profile_model_info, with resource = localhost:0
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:25<00:00, 25.01s/it]
[2023-12-04 11:52:08,378] [ERROR] [autotuner.py:699:model_info_profile_run] The model is not runnable with DeepSpeed with error = (

[2023-12-04 11:52:08,378] [INFO] [runner.py:367:run_autotuning] [End] Running autotuning
[2023-12-04 11:52:08,378] [INFO] [autotuner.py:1110:run_after_tuning] No optimal DeepSpeed configuration found by autotuning.

Expected behavior

train successfully

Dec 04 '23 12:12 yongjer

cc @muellerzr or @pacman100

Dec 05 '23 14:12 ArthurZucker

Can you provide a bit more logs? Earlier in the trace I see

The model is not runnable with DeepSpeed with error

Dec 05 '23 14:12 muellerzr

sorry, I'm not sure what you mean here is already the whole log of output

Dec 05 '23 14:12 yongjer

Is there any other trace?

This is the chunk I'm talking about. It looks cut off:

HERE -> [2023-12-04 11:52:08,378] [ERROR] [autotuner.py:699:model_info_profile_run] The model is not runnable with DeepSpeed with error = (

[2023-12-04 11:52:08,378] [INFO] [runner.py:367:run_autotuning] [End] Running autotuning
[2023-12-04 11:52:08,378] [INFO] [autotuner.py:1110:run_after_tuning] No optimal DeepSpeed configuration found by autotuning.

Dec 05 '23 14:12 muellerzr

Unfortunately, there is no other trace. It leaves the whole line blank as above

Dec 05 '23 14:12 yongjer

it does look like cut off

hf@8913c96d24e3:/workspaces/hf$ deepspeed  --autotuning run ./script/run_classification.py --model_name_or_path ckip-joint/bloom-1b1-zh --do_train --do_eval --output_dir ./bloom --train_file ./data/train.csv --validation_file ./data/test.csv --text_column_names sentence --label_column_name label --overwrite_output_dir --fp16 --torch_compile --deepspeed cfg/auto.json
[2023-12-05 14:53:47,008] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-05 14:53:48,023] [WARNING] [runner.py:203:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-12-05 14:53:48,023] [INFO] [autotuner.py:71:__init__] Created autotuning experiments directory: autotuning_exps
[2023-12-05 14:53:48,023] [INFO] [autotuner.py:84:__init__] Created autotuning results directory: autotuning_exps
[2023-12-05 14:53:48,023] [INFO] [autotuner.py:200:_get_resource_manager] active_resources = OrderedDict([('localhost', [0])])
[2023-12-05 14:53:48,023] [INFO] [runner.py:362:run_autotuning] [Start] Running autotuning
[2023-12-05 14:53:48,023] [INFO] [autotuner.py:669:model_info_profile_run] Starting model info profile run.
  0%|                                      | 0/1 [00:00<?, ?it/s][2023-12-05 14:53:48,025] [INFO] [scheduler.py:344:run_experiment] Scheduler wrote ds_config to autotuning_results/profile_model_info/ds_config.json, /workspaces/hf/autotuning_results/profile_model_info/ds_config.json
[2023-12-05 14:53:48,026] [INFO] [scheduler.py:351:run_experiment] Scheduler wrote exp to autotuning_results/profile_model_info/exp.json, /workspaces/hf/autotuning_results/profile_model_info/exp.json
[2023-12-05 14:53:48,026] [INFO] [scheduler.py:378:run_experiment] Launching exp_id = 0, exp_name = profile_model_info, with resource = localhost:0, and ds_config = /workspaces/hf/autotuning_results/profile_model_info/ds_config.json
localhost: ssh: connect to host localhost port 22: Cannot assign requested address
pdsh@8913c96d24e3: localhost: ssh exited with exit code 255
[2023-12-05 14:54:03,369] [INFO] [scheduler.py:430:clean_up] Done cleaning up exp_id = 0 on the following workers: localhost
[2023-12-05 14:54:03,369] [INFO] [scheduler.py:393:run_experiment] Done running exp_id = 0, exp_name = profile_model_info, with resource = localhost:0
100%|██████████████████████████████| 1/1 [00:25<00:00, 25.01s/it]
[2023-12-05 14:54:13,038] [ERROR] [autotuner.py:699:model_info_profile_run] The model is not runnable with DeepSpeed with error = (

[2023-12-05 14:54:13,038] [INFO] [runner.py:367:run_autotuning] [End] Running autotuning
[2023-12-05 14:54:13,038] [INFO] [autotuner.py:1110:run_after_tuning] No optimal DeepSpeed configuration found by autotuning.
hf@8913c96d24e3:/workspaces/hf$

Dec 05 '23 14:12 yongjer

btw, here is my full dockerfile:

FROM huggingface/transformers-pytorch-deepspeed-latest-gpu:latest
RUN apt-get update && apt-get install -y pdsh
RUN pip install --upgrade pip bitsandbytes deepspeed[autotuning]
# non-root user

ARG USERNAME=hf
ARG USER_UID=1000
ARG USER_GID=$USER_UID

# Create the user
RUN groupadd --gid $USER_GID $USERNAME \
    && useradd --uid $USER_UID --gid $USER_GID -m $USERNAME \
    #
    # [Optional] Add sudo support. Omit if you don't need to install software after connecting.
    && apt-get update \
    && apt-get install -y sudo \
    && echo $USERNAME ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/$USERNAME \
    && chmod 0440 /etc/sudoers.d/$USERNAME

# ********************************************************
# * Anything else you want to do like clean up goes here *
# ********************************************************

# [Optional] Set the default user. Omit if you want to keep the default as root.
USER $USERNAME

Dec 06 '23 03:12 yongjer

I'm not sure whether these help

hf@ffc9973e2c76:/workspaces/hf$ tree
.
├── DockerFile.hf
├── autotuning_exps
│   └── profile_model_info.json
├── autotuning_results
│   └── profile_model_info
│       ├── cmd.txt
│       ├── ds_config.json
│       ├── exp.json
│       ├── stderr.log
│       └── stdout.log
├── bloom
├── cfg
│   ├── auto.json
│   └── ds_config_zero3.json
├── data
│   ├── test.csv
│   └── train.csv
├── nvme
│   └── zero_stage_3
│       └── float16params
│           └── rank0
├── run
│   ├── acclerate.sh
│   ├── deepspeed.sh
│   ├── deepspeed_auto.sh
│   └── text_classification.sh
├── script
│   ├── run_classification.py
│   ├── run_glue_no_trainer.py
│   └── test.py
└── tmp

13 directories, 18 files

generate by autotuning:

autotuning_exps/profile_model_info.json:

{"name": "profile_model_info", "ds_config": {"train_micro_batch_size_per_gpu": 1, "autotuning": {"enabled": true, "model_info_path": "autotuning_results/profile_model_info/model_info.json", "model_info": {"profile": true}}, "zero_optimization": {"stage": 3}, "memory_break_down": false}, "num_gpus": 1, "num_nodes": 1}

autotuning_results/profile_model_info/cmd.txt:

deepspeed --include localhost:0 --master_port 29500 ./script/run_classification.py --model_name_or_path ckip-joint/bloom-1b1-zh --do_train --do_eval --output_dir ./bloom --train_file ./data/train.csv --validation_file ./data/test.csv --text_column_names sentence --label_column_name label --overwrite_output_dir --fp16 --torch_compile --deepspeed eyJ0cmFpbl9taWNyb19iYXRjaF9zaXplX3Blcl9ncHUiOiAxLCAiYXV0b3R1bmluZyI6IHsiZW5hYmxlZCI6IHRydWUsICJtb2RlbF9pbmZvX3BhdGgiOiAiYXV0b3R1bmluZ19yZXN1bHRzL3Byb2ZpbGVfbW9kZWxfaW5mby9tb2RlbF9pbmZvLmpzb24iLCAibW9kZWxfaW5mbyI6IHsicHJvZmlsZSI6IHRydWV9LCAibWV0cmljX3BhdGgiOiAiYXV0b3R1bmluZ19yZXN1bHRzL3Byb2ZpbGVfbW9kZWxfaW5mby9tZXRyaWNzLmpzb24ifSwgInplcm9fb3B0aW1pemF0aW9uIjogeyJzdGFnZSI6IDN9LCAibWVtb3J5X2JyZWFrX2Rvd24iOiBmYWxzZX0=

autotuning_results/profile_model_info/ds_config.json:

{"train_micro_batch_size_per_gpu": 1, "autotuning": {"enabled": true, "model_info_path": "autotuning_results/profile_model_info/model_info.json", "model_info": {"profile": true}, "metric_path": "autotuning_results/profile_model_info/metrics.json"}, "zero_optimization": {"stage": 3}, "memory_break_down": false}

autotuning_results/profile_model_info/exp.json:

{"name": "profile_model_info", "ds_config": {"train_micro_batch_size_per_gpu": 1, "autotuning": {"enabled": true, "model_info_path": "autotuning_results/profile_model_info/model_info.json", "model_info": {"profile": true}, "metric_path": "autotuning_results/profile_model_info/metrics.json"}, "zero_optimization": {"stage": 3}, "memory_break_down": false}, "num_gpus": 1, "num_nodes": 1, "exp_id": 0, "result_dir": "autotuning_results/profile_model_info", "master_port": 29500, "launcher_args": ["--include", "localhost:0", "--master_port", "29500"], "user": "unknown-user", "job_id": "unknown-job-id", "ds_config_path": "autotuning_results/profile_model_info/ds_config.json", "ds_config_base64": "eyJ0cmFpbl9taWNyb19iYXRjaF9zaXplX3Blcl9ncHUiOiAxLCAiYXV0b3R1bmluZyI6IHsiZW5hYmxlZCI6IHRydWUsICJtb2RlbF9pbmZvX3BhdGgiOiAiYXV0b3R1bmluZ19yZXN1bHRzL3Byb2ZpbGVfbW9kZWxfaW5mby9tb2RlbF9pbmZvLmpzb24iLCAibW9kZWxfaW5mbyI6IHsicHJvZmlsZSI6IHRydWV9LCAibWV0cmljX3BhdGgiOiAiYXV0b3R1bmluZ19yZXN1bHRzL3Byb2ZpbGVfbW9kZWxfaW5mby9tZXRyaWNzLmpzb24ifSwgInplcm9fb3B0aW1pemF0aW9uIjogeyJzdGFnZSI6IDN9LCAibWVtb3J5X2JyZWFrX2Rvd24iOiBmYWxzZX0="}

autotuning_results/profile_model_info/stderr.log:

Using custom data configuration default-8f347103001581ec
Loading Dataset Infos from /usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/csv
Generating dataset csv (/home/hf/.cache/huggingface/datasets/csv/default-8f347103001581ec/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d)
Downloading and preparing dataset csv/default to /home/hf/.cache/huggingface/datasets/csv/default-8f347103001581ec/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d...

Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]
Downloading data files: 100%|██████████| 2/2 [00:00<00:00, 26973.02it/s]
Downloading took 0.0 min
Checksum Computation took 0.0 min

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]
Extracting data files: 100%|██████████| 2/2 [00:00<00:00, 4132.32it/s]
Generating train split

Generating train split: 0 examples [00:00, ? examples/s]
Generating train split: 4635 examples [00:00, 557212.85 examples/s]
Generating validation split

Generating validation split: 0 examples [00:00, ? examples/s]
Generating validation split: 18 examples [00:00, 13751.82 examples/s]
Unable to verify splits sizes.
Dataset csv downloaded and prepared to /home/hf/.cache/huggingface/datasets/csv/default-8f347103001581ec/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d. Subsequent calls will reuse this data.

config.json:   0%|          | 0.00/706 [00:00<?, ?B/s]
config.json: 100%|██████████| 706/706 [00:00<00:00, 2.33MB/s]
[INFO|configuration_utils.py:718] 2023-12-06 03:21:08,505 >> loading configuration file config.json from cache at /home/hf/.cache/huggingface/hub/models--ckip-joint--bloom-1b1-zh/snapshots/60bed206f673a412c57651456f8c2cf642cdfcfe/config.json
[INFO|configuration_utils.py:778] 2023-12-06 03:21:08,515 >> Model config BloomConfig {
  "_name_or_path": "ckip-joint/bloom-1b1-zh",
  "apply_residual_connection_post_layernorm": false,
  "architectures": [
    "BloomModel"
  ],
  "attention_dropout": 0.0,
  "attention_softmax_in_fp32": true,
  "bias_dropout_fusion": true,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "finetuning_task": "text-classification",
  "hidden_dropout": 0.0,
  "hidden_size": 1536,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "masked_softmax_fusion": true,
  "model_type": "bloom",
  "n_head": 16,
  "n_inner": null,
  "n_layer": 24,
  "offset_alibi": 100,
  "pad_token_id": 3,
  "pretraining_tp": 1,
  "skip_bias_add": true,
  "skip_bias_add_qkv": false,
  "slow_but_exact": false,
  "transformers_version": "4.36.0.dev0",
  "unk_token_id": 0,
  "use_cache": true,
  "vocab_size": 250880
}


tokenizer_config.json:   0%|          | 0.00/222 [00:00<?, ?B/s]
tokenizer_config.json: 100%|██████████| 222/222 [00:00<00:00, 357kB/s]

tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]
tokenizer.json:  72%|███████▏  | 10.5M/14.5M [00:01<00:00, 5.96MB/s]
tokenizer.json: 100%|██████████| 14.5M/14.5M [00:01<00:00, 7.96MB/s]
tokenizer.json: 100%|██████████| 14.5M/14.5M [00:01<00:00, 7.41MB/s]

special_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]
special_tokens_map.json: 100%|██████████| 85.0/85.0 [00:00<00:00, 293kB/s]
[INFO|tokenization_utils_base.py:2026] 2023-12-06 03:21:13,020 >> loading file tokenizer.json from cache at /home/hf/.cache/huggingface/hub/models--ckip-joint--bloom-1b1-zh/snapshots/60bed206f673a412c57651456f8c2cf642cdfcfe/tokenizer.json
[INFO|tokenization_utils_base.py:2026] 2023-12-06 03:21:13,020 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2026] 2023-12-06 03:21:13,020 >> loading file special_tokens_map.json from cache at /home/hf/.cache/huggingface/hub/models--ckip-joint--bloom-1b1-zh/snapshots/60bed206f673a412c57651456f8c2cf642cdfcfe/special_tokens_map.json
[INFO|tokenization_utils_base.py:2026] 2023-12-06 03:21:13,020 >> loading file tokenizer_config.json from cache at /home/hf/.cache/huggingface/hub/models--ckip-joint--bloom-1b1-zh/snapshots/60bed206f673a412c57651456f8c2cf642cdfcfe/tokenizer_config.json

pytorch_model.bin:   0%|          | 0.00/4.26G [00:00<?, ?B/s]
pytorch_model.bin:   0%|          | 10.5M/4.26G [00:01<11:56, 5.93MB/s]
pytorch_model.bin:   0%|          | 21.0M/4.26G [00:02<07:11, 9.82MB/s]
pytorch_model.bin:   1%|          | 31.5M/4.26G [00:02<05:11, 13.6MB/s]
pytorch_model.bin:   1%|          | 41.9M/4.26G [00:03<04:40, 15.1MB/s]
pytorch_model.bin:   1%|          | 52.4M/4.26G [00:03<04:24, 15.9MB/s]
pytorch_model.bin:   1%|▏         | 62.9M/4.26G [00:04<04:13, 16.6MB/s]
pytorch_model.bin:   2%|▏         | 73.4M/4.26G [00:04<03:48, 18.4MB/s]
pytorch_model.bin:   2%|▏         | 83.9M/4.26G [00:05<03:45, 18.5MB/s]
pytorch_model.bin:   2%|▏         | 94.4M/4.26G [00:06<03:46, 18.4MB/s]
pytorch_model.bin:   2%|▏         | 105M/4.26G [00:06<03:27, 20.0MB/s] 
pytorch_model.bin:   3%|▎         | 115M/4.26G [00:07<03:36, 19.2MB/s]
pytorch_model.bin:   3%|▎         | 126M/4.26G [00:07<03:38, 18.9MB/s]
pytorch_model.bin:   3%|▎         | 136M/4.26G [00:08<03:41, 18.6MB/s]
pytorch_model.bin:   3%|▎         | 147M/4.26G [00:08<03:23, 20.2MB/s]
pytorch_model.bin:   4%|▎         | 157M/4.26G [00:09<03:31, 19.4MB/s]
pytorch_model.bin:   4%|▍         | 168M/4.26G [00:09<03:35, 19.0MB/s]
pytorch_model.bin:   4%|▍         | 178M/4.26G [00:10<03:37, 18.7MB/s]
pytorch_model.bin:   4%|▍         | 189M/4.26G [00:10<03:20, 20.3MB/s]
pytorch_model.bin:   5%|▍         | 199M/4.26G [00:11<03:27, 19.5MB/s]
pytorch_model.bin:   5%|▍         | 210M/4.26G [00:12<03:38, 18.5MB/s]
pytorch_model.bin:   5%|▌         | 220M/4.26G [00:12<03:14, 20.7MB/s]
pytorch_model.bin:   5%|▌         | 231M/4.26G [00:13<03:22, 19.9MB/s]
pytorch_model.bin:   6%|▌         | 241M/4.26G [00:13<03:31, 19.0MB/s]
pytorch_model.bin:   6%|▌         | 252M/4.26G [00:14<03:18, 20.2MB/s]
pytorch_model.bin:   6%|▌         | 262M/4.26G [00:14<03:24, 19.6MB/s]
pytorch_model.bin:   6%|▋         | 273M/4.26G [00:15<03:29, 19.1MB/s]
pytorch_model.bin:   7%|▋         | 283M/4.26G [00:15<03:13, 20.5MB/s]
pytorch_model.bin:   7%|▋         | 294M/4.26G [00:16<03:25, 19.3MB/s]
pytorch_model.bin:   7%|▋         | 304M/4.26G [00:16<03:25, 19.2MB/s]
pytorch_model.bin:   7%|▋         | 315M/4.26G [00:17<03:33, 18.5MB/s]
pytorch_model.bin:   8%|▊         | 325M/4.26G [00:17<03:14, 20.2MB/s]
pytorch_model.bin:   8%|▊         | 336M/4.26G [00:18<03:19, 19.7MB/s]
pytorch_model.bin:   8%|▊         | 346M/4.26G [00:18<03:07, 20.9MB/s]
pytorch_model.bin:   8%|▊         | 357M/4.26G [00:19<03:17, 19.8MB/s]
pytorch_model.bin:   9%|▊         | 367M/4.26G [00:19<03:19, 19.6MB/s]
pytorch_model.bin:   9%|▉         | 377M/4.26G [00:20<03:05, 20.9MB/s]
pytorch_model.bin:   9%|▉         | 388M/4.26G [00:20<03:13, 20.0MB/s]
pytorch_model.bin:   9%|▉         | 398M/4.26G [00:21<03:02, 21.2MB/s]
pytorch_model.bin:  10%|▉         | 409M/4.26G [00:21<03:10, 20.2MB/s]
pytorch_model.bin:  10%|▉         | 419M/4.26G [00:22<03:16, 19.6MB/s]
pytorch_model.bin:  10%|█         | 430M/4.26G [00:22<03:04, 20.8MB/s]
pytorch_model.bin:  10%|█         | 440M/4.26G [00:23<03:09, 20.1MB/s]
pytorch_model.bin:  11%|█         | 451M/4.26G [00:23<02:58, 21.4MB/s]
pytorch_model.bin:  11%|█         | 461M/4.26G [00:24<03:07, 20.2MB/s]
pytorch_model.bin:  11%|█         | 472M/4.26G [00:25<03:12, 19.7MB/s]
pytorch_model.bin:  11%|█▏        | 482M/4.26G [00:25<03:00, 20.9MB/s]
pytorch_model.bin:  12%|█▏        | 493M/4.26G [00:26<03:06, 20.2MB/s]
pytorch_model.bin:  12%|█▏        | 503M/4.26G [00:26<02:56, 21.3MB/s]
pytorch_model.bin:  12%|█▏        | 514M/4.26G [00:27<03:03, 20.4MB/s]
pytorch_model.bin:  12%|█▏        | 524M/4.26G [00:27<03:11, 19.5MB/s]
pytorch_model.bin:  13%|█▎        | 535M/4.26G [00:28<02:58, 20.9MB/s]
pytorch_model.bin:  13%|█▎        | 545M/4.26G [00:28<03:06, 19.9MB/s]
pytorch_model.bin:  13%|█▎        | 556M/4.26G [00:29<03:00, 20.6MB/s]
pytorch_model.bin:  13%|█▎        | 566M/4.26G [00:29<03:02, 20.3MB/s]
pytorch_model.bin:  14%|█▎        | 577M/4.26G [00:30<03:07, 19.7MB/s]
pytorch_model.bin:  14%|█▍        | 587M/4.26G [00:30<02:56, 20.9MB/s]
pytorch_model.bin:  14%|█▍        | 598M/4.26G [00:31<03:02, 20.1MB/s]
pytorch_model.bin:  14%|█▍        | 608M/4.26G [00:31<02:52, 21.2MB/s]
pytorch_model.bin:  15%|█▍        | 619M/4.26G [00:32<02:59, 20.3MB/s]
pytorch_model.bin:  15%|█▍        | 629M/4.26G [00:32<03:04, 19.7MB/s]
pytorch_model.bin:  15%|█▌        | 640M/4.26G [00:33<02:53, 20.9MB/s]
pytorch_model.bin:  15%|█▌        | 650M/4.26G [00:33<03:01, 19.9MB/s]
pytorch_model.bin:  16%|█▌        | 661M/4.26G [00:34<03:04, 19.5MB/s]
pytorch_model.bin:  16%|█▌        | 671M/4.26G [00:34<02:54, 20.6MB/s]
pytorch_model.bin:  16%|█▌        | 682M/4.26G [00:35<02:59, 19.9MB/s]
pytorch_model.bin:  16%|█▌        | 692M/4.26G [00:35<02:50, 21.0MB/s]
pytorch_model.bin:  16%|█▋        | 703M/4.26G [00:36<02:55, 20.2MB/s]
pytorch_model.bin:  17%|█▋        | 713M/4.26G [00:36<03:00, 19.7MB/s]
pytorch_model.bin:  17%|█▋        | 724M/4.26G [00:37<02:49, 20.8MB/s]
pytorch_model.bin:  17%|█▋        | 734M/4.26G [00:37<02:54, 20.2MB/s]
pytorch_model.bin:  17%|█▋        | 744M/4.26G [00:38<02:51, 20.5MB/s]
pytorch_model.bin:  18%|█▊        | 755M/4.26G [00:39<02:56, 19.8MB/s]
pytorch_model.bin:  18%|█▊        | 765M/4.26G [00:39<02:58, 19.6MB/s]
pytorch_model.bin:  18%|█▊        | 776M/4.26G [00:40<02:48, 20.6MB/s]
pytorch_model.bin:  18%|█▊        | 786M/4.26G [00:40<02:52, 20.1MB/s]
pytorch_model.bin:  19%|█▊        | 797M/4.26G [00:41<02:44, 21.1MB/s]
pytorch_model.bin:  19%|█▉        | 807M/4.26G [00:41<02:54, 19.8MB/s]
pytorch_model.bin:  19%|█▉        | 818M/4.26G [00:42<02:43, 21.1MB/s]
pytorch_model.bin:  19%|█▉        | 828M/4.26G [00:42<02:50, 20.1MB/s]
pytorch_model.bin:  20%|█▉        | 839M/4.26G [00:43<02:54, 19.6MB/s]
pytorch_model.bin:  20%|█▉        | 849M/4.26G [00:43<02:43, 20.9MB/s]
pytorch_model.bin:  20%|██        | 860M/4.26G [00:44<02:50, 20.0MB/s]
pytorch_model.bin:  20%|██        | 870M/4.26G [00:44<02:39, 21.2MB/s]
pytorch_model.bin:  21%|██        | 881M/4.26G [00:45<02:46, 20.3MB/s]
pytorch_model.bin:  21%|██        | 891M/4.26G [00:45<02:53, 19.4MB/s]
pytorch_model.bin:  21%|██        | 902M/4.26G [00:46<02:57, 19.0MB/s]
pytorch_model.bin:  21%|██▏       | 912M/4.26G [00:46<02:44, 20.4MB/s]
pytorch_model.bin:  22%|██▏       | 923M/4.26G [00:47<02:49, 19.6MB/s]
pytorch_model.bin:  22%|██▏       | 933M/4.26G [00:47<02:53, 19.2MB/s]
pytorch_model.bin:  22%|██▏       | 944M/4.26G [00:48<02:57, 18.7MB/s]
pytorch_model.bin:  22%|██▏       | 954M/4.26G [00:48<02:44, 20.1MB/s]
pytorch_model.bin:  23%|██▎       | 965M/4.26G [00:49<02:48, 19.5MB/s]
pytorch_model.bin:  23%|██▎       | 975M/4.26G [00:49<02:37, 20.9MB/s]
pytorch_model.bin:  23%|██▎       | 986M/4.26G [00:50<02:43, 20.0MB/s]
pytorch_model.bin:  23%|██▎       | 996M/4.26G [00:51<02:47, 19.4MB/s]
pytorch_model.bin:  24%|██▎       | 1.01G/4.26G [00:51<02:36, 20.8MB/s]
pytorch_model.bin:  24%|██▍       | 1.02G/4.26G [00:52<02:43, 19.8MB/s]
pytorch_model.bin:  24%|██▍       | 1.03G/4.26G [00:52<02:47, 19.3MB/s]
pytorch_model.bin:  24%|██▍       | 1.04G/4.26G [00:53<02:36, 20.6MB/s]
pytorch_model.bin:  25%|██▍       | 1.05G/4.26G [00:53<02:42, 19.8MB/s]
pytorch_model.bin:  25%|██▍       | 1.06G/4.26G [00:54<02:46, 19.3MB/s]
pytorch_model.bin:  25%|██▌       | 1.07G/4.26G [00:54<02:34, 20.6MB/s]
pytorch_model.bin:  25%|██▌       | 1.08G/4.26G [00:55<02:40, 19.9MB/s]
pytorch_model.bin:  26%|██▌       | 1.09G/4.26G [00:55<02:29, 21.1MB/s]
pytorch_model.bin:  26%|██▌       | 1.10G/4.26G [00:56<02:37, 20.0MB/s]
pytorch_model.bin:  26%|██▌       | 1.11G/4.26G [00:56<02:27, 21.3MB/s]
pytorch_model.bin:  26%|██▋       | 1.12G/4.26G [00:57<02:35, 20.2MB/s]
pytorch_model.bin:  27%|██▋       | 1.13G/4.26G [00:57<02:39, 19.6MB/s]
pytorch_model.bin:  27%|██▋       | 1.14G/4.26G [00:58<02:28, 20.9MB/s]
pytorch_model.bin:  27%|██▋       | 1.15G/4.26G [00:58<02:35, 20.0MB/s]
pytorch_model.bin:  27%|██▋       | 1.16G/4.26G [00:59<02:39, 19.4MB/s]
pytorch_model.bin:  28%|██▊       | 1.17G/4.26G [00:59<02:29, 20.6MB/s]
pytorch_model.bin:  28%|██▊       | 1.18G/4.26G [01:00<02:34, 19.9MB/s]
pytorch_model.bin:  28%|██▊       | 1.20G/4.26G [01:00<02:26, 20.9MB/s]
pytorch_model.bin:  28%|██▊       | 1.21G/4.26G [01:01<02:30, 20.2MB/s]
pytorch_model.bin:  29%|██▊       | 1.22G/4.26G [01:02<02:36, 19.5MB/s]
pytorch_model.bin:  29%|██▉       | 1.23G/4.26G [01:02<02:39, 19.0MB/s]
pytorch_model.bin:  29%|██▉       | 1.24G/4.26G [01:03<02:28, 20.3MB/s]
pytorch_model.bin:  29%|██▉       | 1.25G/4.26G [01:03<02:32, 19.8MB/s]
pytorch_model.bin:  30%|██▉       | 1.26G/4.26G [01:04<02:36, 19.2MB/s]
pytorch_model.bin:  30%|██▉       | 1.27G/4.26G [01:04<02:39, 18.8MB/s]
pytorch_model.bin:  30%|███       | 1.28G/4.26G [01:05<02:26, 20.3MB/s]
pytorch_model.bin:  30%|███       | 1.29G/4.26G [01:05<02:31, 19.6MB/s]
pytorch_model.bin:  31%|███       | 1.30G/4.26G [01:06<02:35, 19.1MB/s]
pytorch_model.bin:  31%|███       | 1.31G/4.26G [01:06<02:37, 18.7MB/s]
pytorch_model.bin:  31%|███       | 1.32G/4.26G [01:07<02:39, 18.5MB/s]
pytorch_model.bin:  31%|███▏      | 1.33G/4.26G [01:08<02:44, 17.8MB/s]
pytorch_model.bin:  31%|███▏      | 1.34G/4.26G [01:08<02:43, 17.9MB/s]
pytorch_model.bin:  32%|███▏      | 1.35G/4.26G [01:09<02:29, 19.5MB/s]
pytorch_model.bin:  32%|███▏      | 1.36G/4.26G [01:09<02:32, 19.0MB/s]
pytorch_model.bin:  32%|███▏      | 1.37G/4.26G [01:10<02:33, 18.8MB/s]
pytorch_model.bin:  32%|███▏      | 1.38G/4.26G [01:10<02:21, 20.4MB/s]
pytorch_model.bin:  33%|███▎      | 1.39G/4.26G [01:11<02:25, 19.7MB/s]
pytorch_model.bin:  33%|███▎      | 1.41G/4.26G [01:11<02:15, 21.0MB/s]
pytorch_model.bin:  33%|███▎      | 1.42G/4.26G [01:12<02:21, 20.1MB/s]
pytorch_model.bin:  33%|███▎      | 1.43G/4.26G [01:12<02:27, 19.2MB/s]
pytorch_model.bin:  34%|███▎      | 1.44G/4.26G [01:13<02:17, 20.5MB/s]
pytorch_model.bin:  34%|███▍      | 1.45G/4.26G [01:13<02:22, 19.7MB/s]
pytorch_model.bin:  34%|███▍      | 1.46G/4.26G [01:14<02:25, 19.3MB/s]
pytorch_model.bin:  34%|███▍      | 1.47G/4.26G [01:14<02:14, 20.7MB/s]
pytorch_model.bin:  35%|███▍      | 1.48G/4.26G [01:15<02:19, 19.9MB/s]
pytorch_model.bin:  35%|███▍      | 1.49G/4.26G [01:16<02:23, 19.3MB/s]
pytorch_model.bin:  35%|███▌      | 1.50G/4.26G [01:16<02:27, 18.8MB/s]
pytorch_model.bin:  35%|███▌      | 1.51G/4.26G [01:17<02:16, 20.2MB/s]
pytorch_model.bin:  36%|███▌      | 1.52G/4.26G [01:17<02:20, 19.5MB/s]
pytorch_model.bin:  36%|███▌      | 1.53G/4.26G [01:18<02:10, 21.0MB/s]
pytorch_model.bin:  36%|███▌      | 1.54G/4.26G [01:18<02:15, 20.1MB/s]
pytorch_model.bin:  36%|███▋      | 1.55G/4.26G [01:19<02:09, 21.0MB/s]
pytorch_model.bin:  37%|███▋      | 1.56G/4.26G [01:19<02:14, 20.1MB/s]
pytorch_model.bin:  37%|███▋      | 1.57G/4.26G [01:20<02:16, 19.8MB/s]
pytorch_model.bin:  37%|███▋      | 1.58G/4.26G [01:20<02:08, 20.8MB/s]
pytorch_model.bin:  37%|███▋      | 1.59G/4.26G [01:21<02:12, 20.1MB/s]
pytorch_model.bin:  38%|███▊      | 1.60G/4.26G [01:21<02:15, 19.6MB/s]
pytorch_model.bin:  38%|███▊      | 1.61G/4.26G [01:22<02:07, 20.7MB/s]
pytorch_model.bin:  38%|███▊      | 1.63G/4.26G [01:22<02:10, 20.1MB/s]
pytorch_model.bin:  38%|███▊      | 1.64G/4.26G [01:23<02:04, 21.0MB/s]
pytorch_model.bin:  39%|███▊      | 1.65G/4.26G [01:23<02:08, 20.4MB/s]
pytorch_model.bin:  39%|███▉      | 1.66G/4.26G [01:24<02:03, 21.2MB/s]
pytorch_model.bin:  39%|███▉      | 1.67G/4.26G [01:24<02:08, 20.1MB/s]
pytorch_model.bin:  39%|███▉      | 1.68G/4.26G [01:25<02:12, 19.5MB/s]
pytorch_model.bin:  40%|███▉      | 1.69G/4.26G [01:25<02:04, 20.7MB/s]
pytorch_model.bin:  40%|███▉      | 1.70G/4.26G [01:26<02:08, 19.9MB/s]
pytorch_model.bin:  40%|████      | 1.71G/4.26G [01:27<02:12, 19.3MB/s]
pytorch_model.bin:  40%|████      | 1.72G/4.26G [01:27<02:03, 20.6MB/s]
pytorch_model.bin:  41%|████      | 1.73G/4.26G [01:28<02:07, 19.9MB/s]
pytorch_model.bin:  41%|████      | 1.74G/4.26G [01:28<02:02, 20.6MB/s]
pytorch_model.bin:  41%|████      | 1.75G/4.26G [01:28<02:03, 20.3MB/s]
pytorch_model.bin:  41%|████▏     | 1.76G/4.26G [01:29<02:07, 19.6MB/s]
pytorch_model.bin:  42%|████▏     | 1.77G/4.26G [01:30<01:59, 20.9MB/s]
pytorch_model.bin:  42%|████▏     | 1.78G/4.26G [01:30<02:04, 19.9MB/s]
pytorch_model.bin:  42%|████▏     | 1.79G/4.26G [01:31<02:08, 19.3MB/s]
pytorch_model.bin:  42%|████▏     | 1.80G/4.26G [01:31<02:09, 18.9MB/s]
pytorch_model.bin:  43%|████▎     | 1.81G/4.26G [01:32<02:11, 18.6MB/s]
pytorch_model.bin:  43%|████▎     | 1.82G/4.26G [01:32<02:01, 20.1MB/s]
pytorch_model.bin:  43%|████▎     | 1.84G/4.26G [01:33<02:05, 19.4MB/s]
pytorch_model.bin:  43%|████▎     | 1.85G/4.26G [01:33<02:07, 19.0MB/s]
pytorch_model.bin:  44%|████▎     | 1.86G/4.26G [01:34<01:57, 20.5MB/s]
pytorch_model.bin:  44%|████▍     | 1.87G/4.26G [01:34<02:01, 19.7MB/s]
pytorch_model.bin:  44%|████▍     | 1.88G/4.26G [01:35<02:04, 19.2MB/s]
pytorch_model.bin:  44%|████▍     | 1.89G/4.26G [01:35<01:55, 20.6MB/s]
pytorch_model.bin:  45%|████▍     | 1.90G/4.26G [01:36<01:58, 19.9MB/s]
pytorch_model.bin:  45%|████▍     | 1.91G/4.26G [01:36<01:51, 21.2MB/s]
pytorch_model.bin:  45%|████▌     | 1.92G/4.26G [01:37<01:55, 20.2MB/s]
pytorch_model.bin:  45%|████▌     | 1.93G/4.26G [01:37<01:49, 21.3MB/s]
pytorch_model.bin:  46%|████▌     | 1.94G/4.26G [01:38<01:54, 20.3MB/s]
pytorch_model.bin:  46%|████▌     | 1.95G/4.26G [01:39<01:57, 19.7MB/s]
pytorch_model.bin:  46%|████▌     | 1.96G/4.26G [01:39<01:49, 20.9MB/s]
pytorch_model.bin:  46%|████▋     | 1.97G/4.26G [01:40<01:54, 20.0MB/s]
pytorch_model.bin:  47%|████▋     | 1.98G/4.26G [01:40<01:57, 19.4MB/s]
pytorch_model.bin:  47%|████▋     | 1.99G/4.26G [01:41<02:01, 18.7MB/s]
pytorch_model.bin:  47%|████▋     | 2.00G/4.26G [01:41<01:51, 20.2MB/s]
pytorch_model.bin:  47%|████▋     | 2.01G/4.26G [01:42<01:54, 19.6MB/s]
pytorch_model.bin:  47%|████▋     | 2.02G/4.26G [01:42<01:57, 19.1MB/s]
pytorch_model.bin:  48%|████▊     | 2.03G/4.26G [01:43<01:48, 20.5MB/s]
pytorch_model.bin:  48%|████▊     | 2.04G/4.26G [01:43<01:52, 19.8MB/s]
pytorch_model.bin:  48%|████▊     | 2.06G/4.26G [01:44<01:54, 19.2MB/s]
pytorch_model.bin:  48%|████▊     | 2.07G/4.26G [01:44<01:46, 20.6MB/s]
pytorch_model.bin:  49%|████▊     | 2.08G/4.26G [01:45<01:49, 19.9MB/s]
pytorch_model.bin:  49%|████▉     | 2.09G/4.26G [01:45<01:42, 21.2MB/s]
pytorch_model.bin:  49%|████▉     | 2.10G/4.26G [01:46<01:46, 20.3MB/s]
pytorch_model.bin:  49%|████▉     | 2.11G/4.26G [01:46<01:40, 21.4MB/s]
pytorch_model.bin:  50%|████▉     | 2.12G/4.26G [01:47<01:45, 20.3MB/s]
pytorch_model.bin:  50%|████▉     | 2.13G/4.26G [01:47<01:47, 19.8MB/s]
pytorch_model.bin:  50%|█████     | 2.14G/4.26G [01:48<01:40, 21.1MB/s]
pytorch_model.bin:  50%|█████     | 2.15G/4.26G [01:48<01:44, 20.2MB/s]
pytorch_model.bin:  51%|█████     | 2.16G/4.26G [01:49<01:47, 19.5MB/s]
pytorch_model.bin:  51%|█████     | 2.17G/4.26G [01:49<01:40, 20.9MB/s]
pytorch_model.bin:  51%|█████     | 2.18G/4.26G [01:50<01:44, 20.0MB/s]
pytorch_model.bin:  51%|█████▏    | 2.19G/4.26G [01:50<01:37, 21.3MB/s]
pytorch_model.bin:  52%|█████▏    | 2.20G/4.26G [01:51<01:41, 20.3MB/s]
pytorch_model.bin:  52%|█████▏    | 2.21G/4.26G [01:52<01:44, 19.6MB/s]
pytorch_model.bin:  52%|█████▏    | 2.22G/4.26G [01:52<01:37, 21.0MB/s]
pytorch_model.bin:  52%|█████▏    | 2.23G/4.26G [01:53<01:41, 20.1MB/s]
pytorch_model.bin:  53%|█████▎    | 2.24G/4.26G [01:53<01:34, 21.3MB/s]
pytorch_model.bin:  53%|█████▎    | 2.25G/4.26G [01:54<01:38, 20.3MB/s]
pytorch_model.bin:  53%|█████▎    | 2.26G/4.26G [01:54<01:33, 21.5MB/s]
pytorch_model.bin:  53%|█████▎    | 2.28G/4.26G [01:55<01:37, 20.5MB/s]
pytorch_model.bin:  54%|█████▎    | 2.29G/4.26G [01:55<01:39, 19.8MB/s]
pytorch_model.bin:  54%|█████▍    | 2.30G/4.26G [01:56<01:33, 21.0MB/s]
pytorch_model.bin:  54%|█████▍    | 2.31G/4.26G [01:56<01:37, 20.1MB/s]
pytorch_model.bin:  54%|█████▍    | 2.32G/4.26G [01:57<01:32, 21.1MB/s]
pytorch_model.bin:  55%|█████▍    | 2.33G/4.26G [01:57<01:35, 20.3MB/s]
pytorch_model.bin:  55%|█████▍    | 2.34G/4.26G [01:58<01:38, 19.6MB/s]
pytorch_model.bin:  55%|█████▌    | 2.35G/4.26G [01:58<01:40, 19.0MB/s]
pytorch_model.bin:  55%|█████▌    | 2.36G/4.26G [01:59<01:33, 20.4MB/s]
pytorch_model.bin:  56%|█████▌    | 2.37G/4.26G [01:59<01:35, 19.8MB/s]
pytorch_model.bin:  56%|█████▌    | 2.38G/4.26G [02:00<01:37, 19.3MB/s]
pytorch_model.bin:  56%|█████▌    | 2.39G/4.26G [02:00<01:30, 20.7MB/s]
pytorch_model.bin:  56%|█████▋    | 2.40G/4.26G [02:01<01:34, 19.8MB/s]
pytorch_model.bin:  57%|█████▋    | 2.41G/4.26G [02:01<01:36, 19.2MB/s]
pytorch_model.bin:  57%|█████▋    | 2.42G/4.26G [02:02<01:37, 18.8MB/s]
pytorch_model.bin:  57%|█████▋    | 2.43G/4.26G [02:03<01:38, 18.6MB/s]
pytorch_model.bin:  57%|█████▋    | 2.44G/4.26G [02:03<01:31, 19.9MB/s]
pytorch_model.bin:  58%|█████▊    | 2.45G/4.26G [02:04<01:32, 19.4MB/s]
pytorch_model.bin:  58%|█████▊    | 2.46G/4.26G [02:04<01:34, 19.0MB/s]
pytorch_model.bin:  58%|█████▊    | 2.47G/4.26G [02:05<01:37, 18.4MB/s]
pytorch_model.bin:  58%|█████▊    | 2.49G/4.26G [02:05<01:36, 18.5MB/s]
pytorch_model.bin:  59%|█████▊    | 2.50G/4.26G [02:06<01:36, 18.4MB/s]
pytorch_model.bin:  59%|█████▉    | 2.51G/4.26G [02:06<01:28, 19.9MB/s]
pytorch_model.bin:  59%|█████▉    | 2.52G/4.26G [02:07<01:30, 19.3MB/s]
pytorch_model.bin:  59%|█████▉    | 2.53G/4.26G [02:08<01:31, 18.9MB/s]
pytorch_model.bin:  60%|█████▉    | 2.54G/4.26G [02:08<01:32, 18.6MB/s]
pytorch_model.bin:  60%|█████▉    | 2.55G/4.26G [02:09<01:24, 20.2MB/s]
pytorch_model.bin:  60%|██████    | 2.56G/4.26G [02:09<01:27, 19.5MB/s]
pytorch_model.bin:  60%|██████    | 2.57G/4.26G [02:10<01:28, 19.1MB/s]
pytorch_model.bin:  61%|██████    | 2.58G/4.26G [02:10<01:22, 20.5MB/s]
pytorch_model.bin:  61%|██████    | 2.59G/4.26G [02:11<01:25, 19.6MB/s]
pytorch_model.bin:  61%|██████    | 2.60G/4.26G [02:11<01:20, 20.7MB/s]
pytorch_model.bin:  61%|██████▏   | 2.61G/4.26G [02:12<01:22, 19.9MB/s]
pytorch_model.bin:  62%|██████▏   | 2.62G/4.26G [02:12<01:22, 19.8MB/s]
pytorch_model.bin:  62%|██████▏   | 2.63G/4.26G [02:13<01:18, 20.6MB/s]
pytorch_model.bin:  62%|██████▏   | 2.64G/4.26G [02:13<01:19, 20.2MB/s]
pytorch_model.bin:  62%|██████▏   | 2.65G/4.26G [02:14<01:16, 20.9MB/s]
pytorch_model.bin:  63%|██████▎   | 2.66G/4.26G [02:14<01:19, 20.0MB/s]
pytorch_model.bin:  63%|██████▎   | 2.67G/4.26G [02:15<01:19, 19.9MB/s]
pytorch_model.bin:  63%|██████▎   | 2.68G/4.26G [02:15<01:16, 20.7MB/s]
pytorch_model.bin:  63%|██████▎   | 2.69G/4.26G [02:16<01:17, 20.3MB/s]
pytorch_model.bin:  63%|██████▎   | 2.71G/4.26G [02:16<01:15, 20.5MB/s]
pytorch_model.bin:  64%|██████▎   | 2.72G/4.26G [02:17<01:16, 20.1MB/s]
pytorch_model.bin:  64%|██████▍   | 2.73G/4.26G [02:17<01:18, 19.5MB/s]
pytorch_model.bin:  64%|██████▍   | 2.74G/4.26G [02:18<01:13, 20.9MB/s]
pytorch_model.bin:  64%|██████▍   | 2.75G/4.26G [02:18<01:16, 19.9MB/s]
pytorch_model.bin:  65%|██████▍   | 2.76G/4.26G [02:19<01:19, 19.0MB/s]
pytorch_model.bin:  65%|██████▍   | 2.77G/4.26G [02:20<01:18, 19.0MB/s]
pytorch_model.bin:  65%|██████▌   | 2.78G/4.26G [02:20<01:19, 18.7MB/s]
pytorch_model.bin:  65%|██████▌   | 2.79G/4.26G [02:21<01:13, 20.2MB/s]
pytorch_model.bin:  66%|██████▌   | 2.80G/4.26G [02:21<01:15, 19.4MB/s]
pytorch_model.bin:  66%|██████▌   | 2.81G/4.26G [02:22<01:17, 18.8MB/s]
pytorch_model.bin:  66%|██████▌   | 2.82G/4.26G [02:22<01:16, 18.7MB/s]
pytorch_model.bin:  66%|██████▋   | 2.83G/4.26G [02:23<01:17, 18.5MB/s]
pytorch_model.bin:  67%|██████▋   | 2.84G/4.26G [02:23<01:11, 19.9MB/s]
pytorch_model.bin:  67%|██████▋   | 2.85G/4.26G [02:24<01:12, 19.5MB/s]
pytorch_model.bin:  67%|██████▋   | 2.86G/4.26G [02:24<01:07, 20.8MB/s]
pytorch_model.bin:  67%|██████▋   | 2.87G/4.26G [02:25<01:09, 20.0MB/s]
pytorch_model.bin:  68%|██████▊   | 2.88G/4.26G [02:26<01:11, 19.2MB/s]
pytorch_model.bin:  68%|██████▊   | 2.89G/4.26G [02:26<01:12, 18.9MB/s]
pytorch_model.bin:  68%|██████▊   | 2.90G/4.26G [02:27<01:07, 20.2MB/s]
pytorch_model.bin:  68%|██████▊   | 2.92G/4.26G [02:27<01:08, 19.6MB/s]
pytorch_model.bin:  69%|██████▊   | 2.93G/4.26G [02:28<01:09, 19.3MB/s]
pytorch_model.bin:  69%|██████▉   | 2.94G/4.26G [02:28<01:04, 20.7MB/s]
pytorch_model.bin:  69%|██████▉   | 2.95G/4.26G [02:29<01:06, 19.9MB/s]
pytorch_model.bin:  69%|██████▉   | 2.96G/4.26G [02:29<01:01, 21.2MB/s]
pytorch_model.bin:  70%|██████▉   | 2.97G/4.26G [02:30<01:03, 20.3MB/s]
pytorch_model.bin:  70%|██████▉   | 2.98G/4.26G [02:30<01:05, 19.5MB/s]
pytorch_model.bin:  70%|███████   | 2.99G/4.26G [02:31<01:01, 20.8MB/s]
pytorch_model.bin:  70%|███████   | 3.00G/4.26G [02:31<01:03, 19.8MB/s]
pytorch_model.bin:  71%|███████   | 3.01G/4.26G [02:32<01:04, 19.5MB/s]
pytorch_model.bin:  71%|███████   | 3.02G/4.26G [02:32<01:00, 20.5MB/s]
pytorch_model.bin:  71%|███████   | 3.03G/4.26G [02:33<01:02, 19.7MB/s]
pytorch_model.bin:  71%|███████▏  | 3.04G/4.26G [02:33<01:02, 19.4MB/s]
pytorch_model.bin:  72%|███████▏  | 3.05G/4.26G [02:34<00:58, 20.5MB/s]
pytorch_model.bin:  72%|███████▏  | 3.06G/4.26G [02:34<00:59, 20.1MB/s]
pytorch_model.bin:  72%|███████▏  | 3.07G/4.26G [02:35<01:01, 19.3MB/s]
pytorch_model.bin:  72%|███████▏  | 3.08G/4.26G [02:36<01:02, 18.9MB/s]
pytorch_model.bin:  73%|███████▎  | 3.09G/4.26G [02:36<00:57, 20.4MB/s]
pytorch_model.bin:  73%|███████▎  | 3.10G/4.26G [02:37<00:59, 19.5MB/s]
pytorch_model.bin:  73%|███████▎  | 3.11G/4.26G [02:37<00:59, 19.2MB/s]
pytorch_model.bin:  73%|███████▎  | 3.12G/4.26G [02:38<01:00, 18.8MB/s]
pytorch_model.bin:  74%|███████▎  | 3.14G/4.26G [02:38<00:55, 20.3MB/s]
pytorch_model.bin:  74%|███████▍  | 3.15G/4.26G [02:39<00:57, 19.5MB/s]
pytorch_model.bin:  74%|███████▍  | 3.16G/4.26G [02:39<00:58, 19.0MB/s]
pytorch_model.bin:  74%|███████▍  | 3.17G/4.26G [02:40<00:58, 18.7MB/s]
pytorch_model.bin:  75%|███████▍  | 3.18G/4.26G [02:41<01:00, 17.9MB/s]
pytorch_model.bin:  75%|███████▍  | 3.19G/4.26G [02:41<00:55, 19.4MB/s]
pytorch_model.bin:  75%|███████▌  | 3.20G/4.26G [02:42<00:56, 18.7MB/s]
pytorch_model.bin:  75%|███████▌  | 3.21G/4.26G [02:42<00:51, 20.6MB/s]
pytorch_model.bin:  76%|███████▌  | 3.22G/4.26G [02:43<00:52, 19.7MB/s]
pytorch_model.bin:  76%|███████▌  | 3.23G/4.26G [02:43<00:53, 19.3MB/s]
pytorch_model.bin:  76%|███████▌  | 3.24G/4.26G [02:44<00:49, 20.5MB/s]
pytorch_model.bin:  76%|███████▋  | 3.25G/4.26G [02:44<00:50, 19.9MB/s]
pytorch_model.bin:  77%|███████▋  | 3.26G/4.26G [02:45<00:47, 21.0MB/s]
pytorch_model.bin:  77%|███████▋  | 3.27G/4.26G [02:45<00:49, 20.0MB/s]
pytorch_model.bin:  77%|███████▋  | 3.28G/4.26G [02:46<00:49, 19.6MB/s]
pytorch_model.bin:  77%|███████▋  | 3.29G/4.26G [02:46<00:46, 20.9MB/s]
pytorch_model.bin:  78%|███████▊  | 3.30G/4.26G [02:47<00:47, 20.0MB/s]
pytorch_model.bin:  78%|███████▊  | 3.31G/4.26G [02:47<00:45, 21.0MB/s]
pytorch_model.bin:  78%|███████▊  | 3.32G/4.26G [02:48<00:46, 20.2MB/s]
pytorch_model.bin:  78%|███████▊  | 3.33G/4.26G [02:48<00:47, 19.5MB/s]
pytorch_model.bin:  78%|███████▊  | 3.34G/4.26G [02:49<00:45, 20.2MB/s]
pytorch_model.bin:  79%|███████▊  | 3.36G/4.26G [02:49<00:45, 20.0MB/s]
pytorch_model.bin:  79%|███████▉  | 3.37G/4.26G [02:50<00:46, 19.4MB/s]
pytorch_model.bin:  79%|███████▉  | 3.38G/4.26G [02:50<00:43, 20.2MB/s]
pytorch_model.bin:  79%|███████▉  | 3.39G/4.26G [02:51<00:43, 20.0MB/s]
pytorch_model.bin:  80%|███████▉  | 3.40G/4.26G [02:51<00:41, 20.8MB/s]
pytorch_model.bin:  80%|███████▉  | 3.41G/4.26G [02:52<00:43, 19.8MB/s]
pytorch_model.bin:  80%|████████  | 3.42G/4.26G [02:52<00:40, 20.9MB/s]
pytorch_model.bin:  80%|████████  | 3.43G/4.26G [02:53<00:41, 19.9MB/s]
pytorch_model.bin:  81%|████████  | 3.44G/4.26G [02:54<00:41, 19.7MB/s]
pytorch_model.bin:  81%|████████  | 3.45G/4.26G [02:54<00:39, 20.3MB/s]
pytorch_model.bin:  81%|████████  | 3.46G/4.26G [02:55<00:39, 20.2MB/s]
pytorch_model.bin:  81%|████████▏ | 3.47G/4.26G [02:55<00:36, 21.4MB/s]
pytorch_model.bin:  82%|████████▏ | 3.48G/4.26G [02:56<00:38, 20.4MB/s]
pytorch_model.bin:  82%|████████▏ | 3.49G/4.26G [02:56<00:39, 19.7MB/s]
pytorch_model.bin:  82%|████████▏ | 3.50G/4.26G [02:57<00:36, 20.8MB/s]
pytorch_model.bin:  82%|████████▏ | 3.51G/4.26G [02:57<00:41, 18.0MB/s]
pytorch_model.bin:  83%|████████▎ | 3.52G/4.26G [02:58<00:42, 17.3MB/s]
pytorch_model.bin:  83%|████████▎ | 3.53G/4.26G [02:59<00:51, 14.0MB/s]
pytorch_model.bin:  83%|████████▎ | 3.54G/4.26G [03:00<00:54, 13.1MB/s]
pytorch_model.bin:  83%|████████▎ | 3.55G/4.26G [03:01<00:53, 13.2MB/s]
pytorch_model.bin:  84%|████████▎ | 3.57G/4.26G [03:02<00:52, 13.3MB/s]
pytorch_model.bin:  84%|████████▍ | 3.58G/4.26G [03:02<00:48, 14.3MB/s]
pytorch_model.bin:  84%|████████▍ | 3.59G/4.26G [03:03<00:48, 14.0MB/s]
pytorch_model.bin:  84%|████████▍ | 3.60G/4.26G [03:04<00:48, 13.7MB/s]
pytorch_model.bin:  85%|████████▍ | 3.61G/4.26G [03:05<00:49, 13.2MB/s]
pytorch_model.bin:  85%|████████▍ | 3.62G/4.26G [03:06<00:51, 12.6MB/s]
pytorch_model.bin:  85%|████████▌ | 3.63G/4.26G [03:06<00:50, 12.5MB/s]
pytorch_model.bin:  85%|████████▌ | 3.64G/4.26G [03:07<00:50, 12.4MB/s]
pytorch_model.bin:  86%|████████▌ | 3.65G/4.26G [03:08<00:49, 12.3MB/s]
pytorch_model.bin:  86%|████████▌ | 3.66G/4.26G [03:09<00:44, 13.4MB/s]
pytorch_model.bin:  86%|████████▌ | 3.67G/4.26G [03:09<00:40, 14.8MB/s]
pytorch_model.bin:  86%|████████▋ | 3.68G/4.26G [03:10<00:36, 16.0MB/s]
pytorch_model.bin:  87%|████████▋ | 3.69G/4.26G [03:10<00:35, 15.9MB/s]
pytorch_model.bin:  87%|████████▋ | 3.70G/4.26G [03:11<00:36, 15.2MB/s]
pytorch_model.bin:  87%|████████▋ | 3.71G/4.26G [03:12<00:37, 14.5MB/s]
pytorch_model.bin:  87%|████████▋ | 3.72G/4.26G [03:13<00:41, 13.0MB/s]
pytorch_model.bin:  88%|████████▊ | 3.73G/4.26G [03:14<00:39, 13.4MB/s]
pytorch_model.bin:  88%|████████▊ | 3.74G/4.26G [03:15<00:38, 13.6MB/s]
pytorch_model.bin:  88%|████████▊ | 3.75G/4.26G [03:15<00:38, 13.3MB/s]
pytorch_model.bin:  88%|████████▊ | 3.76G/4.26G [03:16<00:37, 13.4MB/s]
pytorch_model.bin:  89%|████████▊ | 3.77G/4.26G [03:17<00:33, 14.3MB/s]
pytorch_model.bin:  89%|████████▉ | 3.79G/4.26G [03:17<00:32, 14.8MB/s]
pytorch_model.bin:  89%|████████▉ | 3.80G/4.26G [03:18<00:29, 15.7MB/s]
pytorch_model.bin:  89%|████████▉ | 3.81G/4.26G [03:19<00:33, 13.6MB/s]
pytorch_model.bin:  90%|████████▉ | 3.82G/4.26G [03:20<00:34, 13.0MB/s]
pytorch_model.bin:  90%|████████▉ | 3.83G/4.26G [03:21<00:33, 12.9MB/s]
pytorch_model.bin:  90%|█████████ | 3.84G/4.26G [03:22<00:35, 12.0MB/s]
pytorch_model.bin:  90%|█████████ | 3.85G/4.26G [03:23<00:36, 11.2MB/s]
pytorch_model.bin:  91%|█████████ | 3.86G/4.26G [03:24<00:37, 10.6MB/s]
pytorch_model.bin:  91%|█████████ | 3.87G/4.26G [03:25<00:35, 11.2MB/s]
pytorch_model.bin:  91%|█████████ | 3.88G/4.26G [03:25<00:28, 13.5MB/s]
pytorch_model.bin:  91%|█████████▏| 3.89G/4.26G [03:25<00:22, 16.2MB/s]
pytorch_model.bin:  92%|█████████▏| 3.90G/4.26G [03:26<00:19, 18.1MB/s]
pytorch_model.bin:  92%|█████████▏| 3.91G/4.26G [03:26<00:18, 18.7MB/s]
pytorch_model.bin:  92%|█████████▏| 3.92G/4.26G [03:27<00:17, 19.9MB/s]
pytorch_model.bin:  92%|█████████▏| 3.93G/4.26G [03:27<00:16, 19.5MB/s]
pytorch_model.bin:  93%|█████████▎| 3.94G/4.26G [03:28<00:16, 19.1MB/s]
pytorch_model.bin:  93%|█████████▎| 3.95G/4.26G [03:29<00:16, 18.8MB/s]
pytorch_model.bin:  93%|█████████▎| 3.96G/4.26G [03:29<00:14, 20.2MB/s]
pytorch_model.bin:  93%|█████████▎| 3.97G/4.26G [03:30<00:14, 19.6MB/s]
pytorch_model.bin:  94%|█████████▎| 3.98G/4.26G [03:30<00:13, 20.6MB/s]
pytorch_model.bin:  94%|█████████▍| 4.00G/4.26G [03:31<00:13, 19.8MB/s]
pytorch_model.bin:  94%|█████████▍| 4.01G/4.26G [03:31<00:12, 20.7MB/s]
pytorch_model.bin:  94%|█████████▍| 4.02G/4.26G [03:32<00:12, 20.3MB/s]
pytorch_model.bin:  94%|█████████▍| 4.03G/4.26G [03:32<00:11, 19.6MB/s]
pytorch_model.bin:  95%|█████████▍| 4.04G/4.26G [03:33<00:10, 20.6MB/s]
pytorch_model.bin:  95%|█████████▍| 4.05G/4.26G [03:33<00:10, 19.8MB/s]
pytorch_model.bin:  95%|█████████▌| 4.06G/4.26G [03:34<00:10, 19.6MB/s]
pytorch_model.bin:  95%|█████████▌| 4.07G/4.26G [03:34<00:09, 20.5MB/s]
pytorch_model.bin:  96%|█████████▌| 4.08G/4.26G [03:35<00:09, 19.9MB/s]
pytorch_model.bin:  96%|█████████▌| 4.09G/4.26G [03:35<00:08, 19.4MB/s]
pytorch_model.bin:  96%|█████████▌| 4.10G/4.26G [03:36<00:08, 18.7MB/s]
pytorch_model.bin:  96%|█████████▋| 4.11G/4.26G [03:36<00:08, 18.7MB/s]
pytorch_model.bin:  97%|█████████▋| 4.12G/4.26G [03:37<00:07, 18.8MB/s]
pytorch_model.bin:  97%|█████████▋| 4.13G/4.26G [03:38<00:06, 18.7MB/s]
pytorch_model.bin:  97%|█████████▋| 4.14G/4.26G [03:38<00:06, 19.9MB/s]
pytorch_model.bin:  97%|█████████▋| 4.15G/4.26G [03:39<00:05, 20.5MB/s]
pytorch_model.bin:  98%|█████████▊| 4.16G/4.26G [03:39<00:04, 19.9MB/s]
pytorch_model.bin:  98%|█████████▊| 4.17G/4.26G [03:40<00:04, 19.4MB/s]
pytorch_model.bin:  98%|█████████▊| 4.18G/4.26G [03:40<00:03, 20.7MB/s]
pytorch_model.bin:  98%|█████████▊| 4.19G/4.26G [03:41<00:03, 19.8MB/s]
pytorch_model.bin:  99%|█████████▊| 4.20G/4.26G [03:41<00:02, 19.3MB/s]
pytorch_model.bin:  99%|█████████▉| 4.22G/4.26G [03:42<00:02, 20.6MB/s]
pytorch_model.bin:  99%|█████████▉| 4.23G/4.26G [03:42<00:01, 19.8MB/s]
pytorch_model.bin:  99%|█████████▉| 4.24G/4.26G [03:43<00:01, 19.2MB/s]
pytorch_model.bin: 100%|█████████▉| 4.25G/4.26G [03:43<00:00, 19.8MB/s]
pytorch_model.bin: 100%|█████████▉| 4.26G/4.26G [03:44<00:00, 19.7MB/s]
pytorch_model.bin: 100%|██████████| 4.26G/4.26G [03:44<00:00, 19.5MB/s]
pytorch_model.bin: 100%|██████████| 4.26G/4.26G [03:44<00:00, 19.0MB/s]
[INFO|modeling_utils.py:3196] 2023-12-06 03:24:59,733 >> loading weights file pytorch_model.bin from cache at /home/hf/.cache/huggingface/hub/models--ckip-joint--bloom-1b1-zh/snapshots/60bed206f673a412c57651456f8c2cf642cdfcfe/pytorch_model.bin
[INFO|modeling_utils.py:3302] 2023-12-06 03:25:00,795 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model
[INFO|modeling_utils.py:4034] 2023-12-06 03:25:01,921 >> All model checkpoint weights were used when initializing BloomForSequenceClassification.

[INFO|modeling_utils.py:4042] 2023-12-06 03:25:01,921 >> All the weights of BloomForSequenceClassification were initialized from the model checkpoint at ckip-joint/bloom-1b1-zh.
If your task is similar to the task the model of the checkpoint was trained on, you can already use BloomForSequenceClassification for predictions without further training.

Running tokenizer on dataset:   0%|          | 0/4635 [00:00<?, ? examples/s]Caching processed dataset at /home/hf/.cache/huggingface/datasets/csv/default-8f347103001581ec/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d/cache-3d872eada15ea9fd.arrow

Running tokenizer on dataset: 100%|██████████| 4635/4635 [00:00<00:00, 42887.77 examples/s]
Running tokenizer on dataset: 100%|██████████| 4635/4635 [00:00<00:00, 42266.86 examples/s]

Running tokenizer on dataset:   0%|          | 0/18 [00:00<?, ? examples/s]Caching processed dataset at /home/hf/.cache/huggingface/datasets/csv/default-8f347103001581ec/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d/cache-e94ca5703f777eb9.arrow

Running tokenizer on dataset: 100%|██████████| 18/18 [00:00<00:00, 5167.17 examples/s]

Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]
Downloading builder script: 100%|██████████| 4.20k/4.20k [00:00<00:00, 9.52MB/s]
[INFO|trainer.py:567] 2023-12-06 03:25:04,457 >> Using auto half precision backend
[INFO|trainer.py:712] 2023-12-06 03:25:04,527 >> The following columns in the training set don't have a corresponding argument in `BloomForSequenceClassification.forward` and have been ignored: user, sentence. If user, sentence are not expected by `BloomForSequenceClassification.forward`,  you can safely ignore this message.
Traceback (most recent call last):
  File "/workspaces/hf/./script/run_classification.py", line 777, in <module>
    main()
  File "/workspaces/hf/./script/run_classification.py", line 712, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1533, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1614, in _inner_training_loop
    self.optimizer, self.lr_scheduler = deepspeed_init(self, num_training_steps=max_steps)
  File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/deepspeed.py", line 362, in deepspeed_init
    hf_deepspeed_config.trainer_config_finalize(args, model, num_training_steps)
  File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/deepspeed.py", line 232, in trainer_config_finalize
    raise ValueError(
ValueError: Please correct the following DeepSpeed config values that mismatch TrainingArguments values:
- ds train_micro_batch_size_per_gpu=1 vs hf per_device_train_batch_size=8
The easiest method is to set these DeepSpeed config values to 'auto'.

autotuning_results/profile_model_info/stdout.log:

[2023-12-06 03:21:01,892] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-06 03:21:02,866] [WARNING] [runner.py:203:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-12-06 03:21:02,867] [INFO] [runner.py:570:main] cmd = /usr/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None ./script/run_classification.py --model_name_or_path ckip-joint/bloom-1b1-zh --do_train --do_eval --output_dir ./bloom --train_file ./data/train.csv --validation_file ./data/test.csv --text_column_names sentence --label_column_name label --overwrite_output_dir --fp16 --torch_compile --deepspeed eyJ0cmFpbl9taWNyb19iYXRjaF9zaXplX3Blcl9ncHUiOiAxLCAiYXV0b3R1bmluZyI6IHsiZW5hYmxlZCI6IHRydWUsICJtb2RlbF9pbmZvX3BhdGgiOiAiYXV0b3R1bmluZ19yZXN1bHRzL3Byb2ZpbGVfbW9kZWxfaW5mby9tb2RlbF9pbmZvLmpzb24iLCAibW9kZWxfaW5mbyI6IHsicHJvZmlsZSI6IHRydWV9LCAibWV0cmljX3BhdGgiOiAiYXV0b3R1bmluZ19yZXN1bHRzL3Byb2ZpbGVfbW9kZWxfaW5mby9tZXRyaWNzLmpzb24ifSwgInplcm9fb3B0aW1pemF0aW9uIjogeyJzdGFnZSI6IDN9LCAibWVtb3J5X2JyZWFrX2Rvd24iOiBmYWxzZX0=
[2023-12-06 03:21:03,871] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-06 03:21:04,844] [INFO] [launch.py:138:main] 0 NCCL_VERSION=2.19.3
[2023-12-06 03:21:04,845] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0]}
[2023-12-06 03:21:04,845] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-12-06 03:21:04,845] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-12-06 03:21:04,845] [INFO] [launch.py:163:main] dist_world_size=1
[2023-12-06 03:21:04,845] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0
[2023-12-06 03:21:06,863] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-06 03:21:06,987] [INFO] [comm.py:637:init_distributed] cdb=None
[2023-12-06 03:21:06,987] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
12/06/2023 03:21:07 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, 16-bits training: True
12/06/2023 03:21:07 - INFO - __main__ - Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=eyJ0cmFpbl9taWNyb19iYXRjaF9zaXplX3Blcl9ncHUiOiAxLCAiYXV0b3R1bmluZyI6IHsiZW5hYmxlZCI6IHRydWUsICJtb2RlbF9pbmZvX3BhdGgiOiAiYXV0b3R1bmluZ19yZXN1bHRzL3Byb2ZpbGVfbW9kZWxfaW5mby9tb2RlbF9pbmZvLmpzb24iLCAibW9kZWxfaW5mbyI6IHsicHJvZmlsZSI6IHRydWV9LCAibWV0cmljX3BhdGgiOiAiYXV0b3R1bmluZ19yZXN1bHRzL3Byb2ZpbGVfbW9kZWxfaW5mby9tZXRyaWNzLmpzb24ifSwgInplcm9fb3B0aW1pemF0aW9uIjogeyJzdGFnZSI6IDN9LCAibWVtb3J5X2JyZWFrX2Rvd24iOiBmYWxzZX0=,
disable_tqdm=False,
dispatch_batches=None,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=True,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
gradient_checkpointing_kwargs=None,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_num_input_tokens_seen=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=./bloom/runs/Dec06_03-21-06_b253663f8948,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=500,
logging_strategy=steps,
lr_scheduler_kwargs={},
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=3.0,
optim=adamw_torch,
optim_args=None,
output_dir=./bloom,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=8,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=./bloom,
save_on_each_node=False,
save_only_model=False,
save_safetensors=True,
save_steps=500,
save_strategy=steps,
save_total_limit=None,
seed=42,
skip_memory_metrics=True,
split_batches=False,
tf32=None,
torch_compile=True,
torch_compile_backend=inductor,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
)
12/06/2023 03:21:07 - INFO - __main__ - load a local file for train: ./data/train.csv
12/06/2023 03:21:07 - INFO - __main__ - load a local file for validation: ./data/test.csv
12/06/2023 03:21:07 - INFO - datasets.builder - Using custom data configuration default-8f347103001581ec
12/06/2023 03:21:07 - INFO - datasets.info - Loading Dataset Infos from /usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/csv
12/06/2023 03:21:07 - INFO - datasets.builder - Generating dataset csv (/home/hf/.cache/huggingface/datasets/csv/default-8f347103001581ec/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d)
12/06/2023 03:21:07 - INFO - datasets.builder - Downloading and preparing dataset csv/default to /home/hf/.cache/huggingface/datasets/csv/default-8f347103001581ec/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d...
12/06/2023 03:21:07 - INFO - datasets.download.download_manager - Downloading took 0.0 min
12/06/2023 03:21:07 - INFO - datasets.download.download_manager - Checksum Computation took 0.0 min
12/06/2023 03:21:07 - INFO - datasets.builder - Generating train split
12/06/2023 03:21:07 - INFO - datasets.builder - Generating validation split
12/06/2023 03:21:07 - INFO - datasets.utils.info_utils - Unable to verify splits sizes.
12/06/2023 03:21:07 - INFO - datasets.builder - Dataset csv downloaded and prepared to /home/hf/.cache/huggingface/datasets/csv/default-8f347103001581ec/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d. Subsequent calls will reuse this data.
12/06/2023 03:21:08 - INFO - __main__ - setting problem type to single label classification
[2023-12-06 03:25:01,464] [INFO] [partition_parameters.py:348:__exit__] finished initializing model - num_params = 294, num_elems = 1.07B
12/06/2023 03:25:02 - WARNING - __main__ - The label2id key in the model config.json is not equal to the label2id key of this run. You can ignore this if you are doing finetuning.
12/06/2023 03:25:02 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/hf/.cache/huggingface/datasets/csv/default-8f347103001581ec/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d/cache-3d872eada15ea9fd.arrow
12/06/2023 03:25:02 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/hf/.cache/huggingface/datasets/csv/default-8f347103001581ec/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d/cache-e94ca5703f777eb9.arrow
12/06/2023 03:25:02 - INFO - __main__ - Sample 912 of the training set: {'user': 'Small Chen', 'sentence': '是啊！滿滿的', 'label': 0, 'input_ids': [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 68211, 4077, 17111, 17111, 373], 'attention_mask': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1]}.
12/06/2023 03:25:02 - INFO - __main__ - Sample 204 of the training set: {'user': '呱吉', 'sentence': '外國人的攔網', 'label': 0, 'input_ids': [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 67003, 6872, 125486, 8211], 'attention_mask': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1]}.
12/06/2023 03:25:02 - INFO - __main__ - Sample 2253 of the training set: {'user': 'Mmmm', 'sentence': '他不是什麼華碩工程師', 'label': 0, 'input_ids': [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 205797, 17212, 7007, 81753, 126320], 'attention_mask': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1]}.
12/06/2023 03:25:04 - INFO - __main__ - Using accuracy as classification score, you can use --metric_name to overwrite.
[2023-12-06 03:25:05,096] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1374
[2023-12-06 03:25:05,097] [ERROR] [launch.py:321:sigkill_handler] ['/usr/bin/python3', '-u', './script/run_classification.py', '--local_rank=0', '--model_name_or_path', 'ckip-joint/bloom-1b1-zh', '--do_train', '--do_eval', '--output_dir', './bloom', '--train_file', './data/train.csv', '--validation_file', './data/test.csv', '--text_column_names', 'sentence', '--label_column_name', 'label', '--overwrite_output_dir', '--fp16', '--torch_compile', '--deepspeed', 'eyJ0cmFpbl9taWNyb19iYXRjaF9zaXplX3Blcl9ncHUiOiAxLCAiYXV0b3R1bmluZyI6IHsiZW5hYmxlZCI6IHRydWUsICJtb2RlbF9pbmZvX3BhdGgiOiAiYXV0b3R1bmluZ19yZXN1bHRzL3Byb2ZpbGVfbW9kZWxfaW5mby9tb2RlbF9pbmZvLmpzb24iLCAibW9kZWxfaW5mbyI6IHsicHJvZmlsZSI6IHRydWV9LCAibWV0cmljX3BhdGgiOiAiYXV0b3R1bmluZ19yZXN1bHRzL3Byb2ZpbGVfbW9kZWxfaW5mby9tZXRyaWNzLmpzb24ifSwgInplcm9fb3B0aW1pemF0aW9uIjogeyJzdGFnZSI6IDN9LCAibWVtb3J5X2JyZWFrX2Rvd24iOiBmYWxzZX0='] exits with return code = 1

Dec 06 '23 08:12 yongjer

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jan 30 '24 08:01 github-actions[bot]

gently pinging @muellerzr as you self assigned this!

Jan 30 '24 10:01 ArthurZucker

Gentle ping @muellerzr

Mar 22 '24 09:03 amyeroberts

same problem. Can someone explain why the "No optimal configure" message appears? Could you also briefly explain the principle behind autotuning?

Mar 28 '24 12:03 Zijie-Tian

Same problem, I've tried many autotuning examples, however, none of those worked out ...

Apr 22 '24 04:04 wqw547243068

Another ping @muellerzr

Apr 22 '24 09:04 amyeroberts

transformers transformers copied to clipboard

[2023-12-04 11:52:08,378] [INFO] [autotuner.py:1110:run_after_tuning] No optimal DeepSpeed configuration found by autotuning.

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

I'm not sure whether these help

generate by autotuning:

transformers
transformers copied to clipboard