transformers
transformers copied to clipboard
[2023-12-04 11:52:08,378] [INFO] [autotuner.py:1110:run_after_tuning] No optimal DeepSpeed configuration found by autotuning.
System Info
docker image: huggingface/transformers-pytorch-deepspeed-latest-gpu:latest
-
transformers
version: 4.36.0.dev0 - Platform: Linux-6.2.0-37-generic-x86_64-with-glibc2.29
- Python version: 3.8.10
- Huggingface_hub version: 0.19.4
- Safetensors version: 0.4.1
- Accelerate version: 0.25.0.dev0
- Accelerate config: not found
- PyTorch version (GPU?): 2.1.0+cu118 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: rtx4060ti 16g
- Using distributed or parallel set-up in script?: no
Who can help?
No response
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction
run:
deepspeed --autotuning run \
./script/run_classification.py \
--model_name_or_path ckip-joint/bloom-1b1-zh \
--do_train \
--do_eval \
--output_dir ./bloom \
--train_file ./data/train.csv \
--validation_file ./data/test.csv \
--text_column_names sentence \
--label_column_name label \
--overwrite_output_dir \
--fp16 \
--torch_compile \
--deepspeed cfg/auto.json
cfg/auto.json:
{
"train_micro_batch_size_per_gpu": "auto",
"autotuning": {
"enabled": true,
"fast": false
}
}
the error:
[2023-12-04 11:51:42,325] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-04 11:51:43,363] [WARNING] [runner.py:203:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-12-04 11:51:43,363] [INFO] [autotuner.py:71:__init__] Created autotuning experiments directory: autotuning_exps
[2023-12-04 11:51:43,364] [INFO] [autotuner.py:84:__init__] Created autotuning results directory: autotuning_exps
[2023-12-04 11:51:43,364] [INFO] [autotuner.py:200:_get_resource_manager] active_resources = OrderedDict([('localhost', [0])])
[2023-12-04 11:51:43,364] [INFO] [runner.py:362:run_autotuning] [Start] Running autotuning
[2023-12-04 11:51:43,364] [INFO] [autotuner.py:669:model_info_profile_run] Starting model info profile run.
0%| | 0/1 [00:00<?, ?it/s][2023-12-04 11:51:43,366] [INFO] [scheduler.py:344:run_experiment] Scheduler wrote ds_config to autotuning_results/profile_model_info/ds_config.json, /workspaces/hf/autotuning_results/profile_model_info/ds_config.json
[2023-12-04 11:51:43,367] [INFO] [scheduler.py:351:run_experiment] Scheduler wrote exp to autotuning_results/profile_model_info/exp.json, /workspaces/hf/autotuning_results/profile_model_info/exp.json
[2023-12-04 11:51:43,367] [INFO] [scheduler.py:378:run_experiment] Launching exp_id = 0, exp_name = profile_model_info, with resource = localhost:0, and ds_config = /workspaces/hf/autotuning_results/profile_model_info/ds_config.json
localhost: ssh: connect to host localhost port 22: Cannot assign requested address
pdsh@b97c1584d47d: localhost: ssh exited with exit code 255
[2023-12-04 11:51:59,057] [INFO] [scheduler.py:430:clean_up] Done cleaning up exp_id = 0 on the following workers: localhost
[2023-12-04 11:51:59,057] [INFO] [scheduler.py:393:run_experiment] Done running exp_id = 0, exp_name = profile_model_info, with resource = localhost:0
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:25<00:00, 25.01s/it]
[2023-12-04 11:52:08,378] [ERROR] [autotuner.py:699:model_info_profile_run] The model is not runnable with DeepSpeed with error = (
[2023-12-04 11:52:08,378] [INFO] [runner.py:367:run_autotuning] [End] Running autotuning
[2023-12-04 11:52:08,378] [INFO] [autotuner.py:1110:run_after_tuning] No optimal DeepSpeed configuration found by autotuning.
Expected behavior
train successfully
cc @muellerzr or @pacman100
Can you provide a bit more logs? Earlier in the trace I see
The model is not runnable with DeepSpeed with error
sorry, I'm not sure what you mean here is already the whole log of output
Is there any other trace?
This is the chunk I'm talking about. It looks cut off:
HERE -> [2023-12-04 11:52:08,378] [ERROR] [autotuner.py:699:model_info_profile_run] The model is not runnable with DeepSpeed with error = (
[2023-12-04 11:52:08,378] [INFO] [runner.py:367:run_autotuning] [End] Running autotuning
[2023-12-04 11:52:08,378] [INFO] [autotuner.py:1110:run_after_tuning] No optimal DeepSpeed configuration found by autotuning.
Unfortunately, there is no other trace. It leaves the whole line blank as above
it does look like cut off
hf@8913c96d24e3:/workspaces/hf$ deepspeed --autotuning run ./script/run_classification.py --model_name_or_path ckip-joint/bloom-1b1-zh --do_train --do_eval --output_dir ./bloom --train_file ./data/train.csv --validation_file ./data/test.csv --text_column_names sentence --label_column_name label --overwrite_output_dir --fp16 --torch_compile --deepspeed cfg/auto.json
[2023-12-05 14:53:47,008] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-05 14:53:48,023] [WARNING] [runner.py:203:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-12-05 14:53:48,023] [INFO] [autotuner.py:71:__init__] Created autotuning experiments directory: autotuning_exps
[2023-12-05 14:53:48,023] [INFO] [autotuner.py:84:__init__] Created autotuning results directory: autotuning_exps
[2023-12-05 14:53:48,023] [INFO] [autotuner.py:200:_get_resource_manager] active_resources = OrderedDict([('localhost', [0])])
[2023-12-05 14:53:48,023] [INFO] [runner.py:362:run_autotuning] [Start] Running autotuning
[2023-12-05 14:53:48,023] [INFO] [autotuner.py:669:model_info_profile_run] Starting model info profile run.
0%| | 0/1 [00:00<?, ?it/s][2023-12-05 14:53:48,025] [INFO] [scheduler.py:344:run_experiment] Scheduler wrote ds_config to autotuning_results/profile_model_info/ds_config.json, /workspaces/hf/autotuning_results/profile_model_info/ds_config.json
[2023-12-05 14:53:48,026] [INFO] [scheduler.py:351:run_experiment] Scheduler wrote exp to autotuning_results/profile_model_info/exp.json, /workspaces/hf/autotuning_results/profile_model_info/exp.json
[2023-12-05 14:53:48,026] [INFO] [scheduler.py:378:run_experiment] Launching exp_id = 0, exp_name = profile_model_info, with resource = localhost:0, and ds_config = /workspaces/hf/autotuning_results/profile_model_info/ds_config.json
localhost: ssh: connect to host localhost port 22: Cannot assign requested address
pdsh@8913c96d24e3: localhost: ssh exited with exit code 255
[2023-12-05 14:54:03,369] [INFO] [scheduler.py:430:clean_up] Done cleaning up exp_id = 0 on the following workers: localhost
[2023-12-05 14:54:03,369] [INFO] [scheduler.py:393:run_experiment] Done running exp_id = 0, exp_name = profile_model_info, with resource = localhost:0
100%|ββββββββββββββββββββββββββββββ| 1/1 [00:25<00:00, 25.01s/it]
[2023-12-05 14:54:13,038] [ERROR] [autotuner.py:699:model_info_profile_run] The model is not runnable with DeepSpeed with error = (
[2023-12-05 14:54:13,038] [INFO] [runner.py:367:run_autotuning] [End] Running autotuning
[2023-12-05 14:54:13,038] [INFO] [autotuner.py:1110:run_after_tuning] No optimal DeepSpeed configuration found by autotuning.
hf@8913c96d24e3:/workspaces/hf$
btw, here is my full dockerfile:
FROM huggingface/transformers-pytorch-deepspeed-latest-gpu:latest
RUN apt-get update && apt-get install -y pdsh
RUN pip install --upgrade pip bitsandbytes deepspeed[autotuning]
# non-root user
ARG USERNAME=hf
ARG USER_UID=1000
ARG USER_GID=$USER_UID
# Create the user
RUN groupadd --gid $USER_GID $USERNAME \
&& useradd --uid $USER_UID --gid $USER_GID -m $USERNAME \
#
# [Optional] Add sudo support. Omit if you don't need to install software after connecting.
&& apt-get update \
&& apt-get install -y sudo \
&& echo $USERNAME ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/$USERNAME \
&& chmod 0440 /etc/sudoers.d/$USERNAME
# ********************************************************
# * Anything else you want to do like clean up goes here *
# ********************************************************
# [Optional] Set the default user. Omit if you want to keep the default as root.
USER $USERNAME
I'm not sure whether these help
hf@ffc9973e2c76:/workspaces/hf$ tree
.
βββ DockerFile.hf
βββ autotuning_exps
β βββ profile_model_info.json
βββ autotuning_results
β βββ profile_model_info
β βββ cmd.txt
β βββ ds_config.json
β βββ exp.json
β βββ stderr.log
β βββ stdout.log
βββ bloom
βββ cfg
β βββ auto.json
β βββ ds_config_zero3.json
βββ data
β βββ test.csv
β βββ train.csv
βββ nvme
β βββ zero_stage_3
β βββ float16params
β βββ rank0
βββ run
β βββ acclerate.sh
β βββ deepspeed.sh
β βββ deepspeed_auto.sh
β βββ text_classification.sh
βββ script
β βββ run_classification.py
β βββ run_glue_no_trainer.py
β βββ test.py
βββ tmp
13 directories, 18 files
generate by autotuning:
autotuning_exps/profile_model_info.json:
{"name": "profile_model_info", "ds_config": {"train_micro_batch_size_per_gpu": 1, "autotuning": {"enabled": true, "model_info_path": "autotuning_results/profile_model_info/model_info.json", "model_info": {"profile": true}}, "zero_optimization": {"stage": 3}, "memory_break_down": false}, "num_gpus": 1, "num_nodes": 1}
autotuning_results/profile_model_info/cmd.txt:
deepspeed --include localhost:0 --master_port 29500 ./script/run_classification.py --model_name_or_path ckip-joint/bloom-1b1-zh --do_train --do_eval --output_dir ./bloom --train_file ./data/train.csv --validation_file ./data/test.csv --text_column_names sentence --label_column_name label --overwrite_output_dir --fp16 --torch_compile --deepspeed eyJ0cmFpbl9taWNyb19iYXRjaF9zaXplX3Blcl9ncHUiOiAxLCAiYXV0b3R1bmluZyI6IHsiZW5hYmxlZCI6IHRydWUsICJtb2RlbF9pbmZvX3BhdGgiOiAiYXV0b3R1bmluZ19yZXN1bHRzL3Byb2ZpbGVfbW9kZWxfaW5mby9tb2RlbF9pbmZvLmpzb24iLCAibW9kZWxfaW5mbyI6IHsicHJvZmlsZSI6IHRydWV9LCAibWV0cmljX3BhdGgiOiAiYXV0b3R1bmluZ19yZXN1bHRzL3Byb2ZpbGVfbW9kZWxfaW5mby9tZXRyaWNzLmpzb24ifSwgInplcm9fb3B0aW1pemF0aW9uIjogeyJzdGFnZSI6IDN9LCAibWVtb3J5X2JyZWFrX2Rvd24iOiBmYWxzZX0=
autotuning_results/profile_model_info/ds_config.json:
{"train_micro_batch_size_per_gpu": 1, "autotuning": {"enabled": true, "model_info_path": "autotuning_results/profile_model_info/model_info.json", "model_info": {"profile": true}, "metric_path": "autotuning_results/profile_model_info/metrics.json"}, "zero_optimization": {"stage": 3}, "memory_break_down": false}
autotuning_results/profile_model_info/exp.json:
{"name": "profile_model_info", "ds_config": {"train_micro_batch_size_per_gpu": 1, "autotuning": {"enabled": true, "model_info_path": "autotuning_results/profile_model_info/model_info.json", "model_info": {"profile": true}, "metric_path": "autotuning_results/profile_model_info/metrics.json"}, "zero_optimization": {"stage": 3}, "memory_break_down": false}, "num_gpus": 1, "num_nodes": 1, "exp_id": 0, "result_dir": "autotuning_results/profile_model_info", "master_port": 29500, "launcher_args": ["--include", "localhost:0", "--master_port", "29500"], "user": "unknown-user", "job_id": "unknown-job-id", "ds_config_path": "autotuning_results/profile_model_info/ds_config.json", "ds_config_base64": "eyJ0cmFpbl9taWNyb19iYXRjaF9zaXplX3Blcl9ncHUiOiAxLCAiYXV0b3R1bmluZyI6IHsiZW5hYmxlZCI6IHRydWUsICJtb2RlbF9pbmZvX3BhdGgiOiAiYXV0b3R1bmluZ19yZXN1bHRzL3Byb2ZpbGVfbW9kZWxfaW5mby9tb2RlbF9pbmZvLmpzb24iLCAibW9kZWxfaW5mbyI6IHsicHJvZmlsZSI6IHRydWV9LCAibWV0cmljX3BhdGgiOiAiYXV0b3R1bmluZ19yZXN1bHRzL3Byb2ZpbGVfbW9kZWxfaW5mby9tZXRyaWNzLmpzb24ifSwgInplcm9fb3B0aW1pemF0aW9uIjogeyJzdGFnZSI6IDN9LCAibWVtb3J5X2JyZWFrX2Rvd24iOiBmYWxzZX0="}
autotuning_results/profile_model_info/stderr.log:
Using custom data configuration default-8f347103001581ec
Loading Dataset Infos from /usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/csv
Generating dataset csv (/home/hf/.cache/huggingface/datasets/csv/default-8f347103001581ec/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d)
Downloading and preparing dataset csv/default to /home/hf/.cache/huggingface/datasets/csv/default-8f347103001581ec/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d...
Downloading data files: 0%| | 0/2 [00:00<?, ?it/s]
Downloading data files: 100%|ββββββββββ| 2/2 [00:00<00:00, 26973.02it/s]
Downloading took 0.0 min
Checksum Computation took 0.0 min
Extracting data files: 0%| | 0/2 [00:00<?, ?it/s]
Extracting data files: 100%|ββββββββββ| 2/2 [00:00<00:00, 4132.32it/s]
Generating train split
Generating train split: 0 examples [00:00, ? examples/s]
Generating train split: 4635 examples [00:00, 557212.85 examples/s]
Generating validation split
Generating validation split: 0 examples [00:00, ? examples/s]
Generating validation split: 18 examples [00:00, 13751.82 examples/s]
Unable to verify splits sizes.
Dataset csv downloaded and prepared to /home/hf/.cache/huggingface/datasets/csv/default-8f347103001581ec/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d. Subsequent calls will reuse this data.
config.json: 0%| | 0.00/706 [00:00<?, ?B/s]
config.json: 100%|ββββββββββ| 706/706 [00:00<00:00, 2.33MB/s]
[INFO|configuration_utils.py:718] 2023-12-06 03:21:08,505 >> loading configuration file config.json from cache at /home/hf/.cache/huggingface/hub/models--ckip-joint--bloom-1b1-zh/snapshots/60bed206f673a412c57651456f8c2cf642cdfcfe/config.json
[INFO|configuration_utils.py:778] 2023-12-06 03:21:08,515 >> Model config BloomConfig {
"_name_or_path": "ckip-joint/bloom-1b1-zh",
"apply_residual_connection_post_layernorm": false,
"architectures": [
"BloomModel"
],
"attention_dropout": 0.0,
"attention_softmax_in_fp32": true,
"bias_dropout_fusion": true,
"bos_token_id": 1,
"eos_token_id": 2,
"finetuning_task": "text-classification",
"hidden_dropout": 0.0,
"hidden_size": 1536,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"masked_softmax_fusion": true,
"model_type": "bloom",
"n_head": 16,
"n_inner": null,
"n_layer": 24,
"offset_alibi": 100,
"pad_token_id": 3,
"pretraining_tp": 1,
"skip_bias_add": true,
"skip_bias_add_qkv": false,
"slow_but_exact": false,
"transformers_version": "4.36.0.dev0",
"unk_token_id": 0,
"use_cache": true,
"vocab_size": 250880
}
tokenizer_config.json: 0%| | 0.00/222 [00:00<?, ?B/s]
tokenizer_config.json: 100%|ββββββββββ| 222/222 [00:00<00:00, 357kB/s]
tokenizer.json: 0%| | 0.00/14.5M [00:00<?, ?B/s]
tokenizer.json: 72%|ββββββββ | 10.5M/14.5M [00:01<00:00, 5.96MB/s]
tokenizer.json: 100%|ββββββββββ| 14.5M/14.5M [00:01<00:00, 7.96MB/s]
tokenizer.json: 100%|ββββββββββ| 14.5M/14.5M [00:01<00:00, 7.41MB/s]
special_tokens_map.json: 0%| | 0.00/85.0 [00:00<?, ?B/s]
special_tokens_map.json: 100%|ββββββββββ| 85.0/85.0 [00:00<00:00, 293kB/s]
[INFO|tokenization_utils_base.py:2026] 2023-12-06 03:21:13,020 >> loading file tokenizer.json from cache at /home/hf/.cache/huggingface/hub/models--ckip-joint--bloom-1b1-zh/snapshots/60bed206f673a412c57651456f8c2cf642cdfcfe/tokenizer.json
[INFO|tokenization_utils_base.py:2026] 2023-12-06 03:21:13,020 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2026] 2023-12-06 03:21:13,020 >> loading file special_tokens_map.json from cache at /home/hf/.cache/huggingface/hub/models--ckip-joint--bloom-1b1-zh/snapshots/60bed206f673a412c57651456f8c2cf642cdfcfe/special_tokens_map.json
[INFO|tokenization_utils_base.py:2026] 2023-12-06 03:21:13,020 >> loading file tokenizer_config.json from cache at /home/hf/.cache/huggingface/hub/models--ckip-joint--bloom-1b1-zh/snapshots/60bed206f673a412c57651456f8c2cf642cdfcfe/tokenizer_config.json
pytorch_model.bin: 0%| | 0.00/4.26G [00:00<?, ?B/s]
pytorch_model.bin: 0%| | 10.5M/4.26G [00:01<11:56, 5.93MB/s]
pytorch_model.bin: 0%| | 21.0M/4.26G [00:02<07:11, 9.82MB/s]
pytorch_model.bin: 1%| | 31.5M/4.26G [00:02<05:11, 13.6MB/s]
pytorch_model.bin: 1%| | 41.9M/4.26G [00:03<04:40, 15.1MB/s]
pytorch_model.bin: 1%| | 52.4M/4.26G [00:03<04:24, 15.9MB/s]
pytorch_model.bin: 1%|β | 62.9M/4.26G [00:04<04:13, 16.6MB/s]
pytorch_model.bin: 2%|β | 73.4M/4.26G [00:04<03:48, 18.4MB/s]
pytorch_model.bin: 2%|β | 83.9M/4.26G [00:05<03:45, 18.5MB/s]
pytorch_model.bin: 2%|β | 94.4M/4.26G [00:06<03:46, 18.4MB/s]
pytorch_model.bin: 2%|β | 105M/4.26G [00:06<03:27, 20.0MB/s]
pytorch_model.bin: 3%|β | 115M/4.26G [00:07<03:36, 19.2MB/s]
pytorch_model.bin: 3%|β | 126M/4.26G [00:07<03:38, 18.9MB/s]
pytorch_model.bin: 3%|β | 136M/4.26G [00:08<03:41, 18.6MB/s]
pytorch_model.bin: 3%|β | 147M/4.26G [00:08<03:23, 20.2MB/s]
pytorch_model.bin: 4%|β | 157M/4.26G [00:09<03:31, 19.4MB/s]
pytorch_model.bin: 4%|β | 168M/4.26G [00:09<03:35, 19.0MB/s]
pytorch_model.bin: 4%|β | 178M/4.26G [00:10<03:37, 18.7MB/s]
pytorch_model.bin: 4%|β | 189M/4.26G [00:10<03:20, 20.3MB/s]
pytorch_model.bin: 5%|β | 199M/4.26G [00:11<03:27, 19.5MB/s]
pytorch_model.bin: 5%|β | 210M/4.26G [00:12<03:38, 18.5MB/s]
pytorch_model.bin: 5%|β | 220M/4.26G [00:12<03:14, 20.7MB/s]
pytorch_model.bin: 5%|β | 231M/4.26G [00:13<03:22, 19.9MB/s]
pytorch_model.bin: 6%|β | 241M/4.26G [00:13<03:31, 19.0MB/s]
pytorch_model.bin: 6%|β | 252M/4.26G [00:14<03:18, 20.2MB/s]
pytorch_model.bin: 6%|β | 262M/4.26G [00:14<03:24, 19.6MB/s]
pytorch_model.bin: 6%|β | 273M/4.26G [00:15<03:29, 19.1MB/s]
pytorch_model.bin: 7%|β | 283M/4.26G [00:15<03:13, 20.5MB/s]
pytorch_model.bin: 7%|β | 294M/4.26G [00:16<03:25, 19.3MB/s]
pytorch_model.bin: 7%|β | 304M/4.26G [00:16<03:25, 19.2MB/s]
pytorch_model.bin: 7%|β | 315M/4.26G [00:17<03:33, 18.5MB/s]
pytorch_model.bin: 8%|β | 325M/4.26G [00:17<03:14, 20.2MB/s]
pytorch_model.bin: 8%|β | 336M/4.26G [00:18<03:19, 19.7MB/s]
pytorch_model.bin: 8%|β | 346M/4.26G [00:18<03:07, 20.9MB/s]
pytorch_model.bin: 8%|β | 357M/4.26G [00:19<03:17, 19.8MB/s]
pytorch_model.bin: 9%|β | 367M/4.26G [00:19<03:19, 19.6MB/s]
pytorch_model.bin: 9%|β | 377M/4.26G [00:20<03:05, 20.9MB/s]
pytorch_model.bin: 9%|β | 388M/4.26G [00:20<03:13, 20.0MB/s]
pytorch_model.bin: 9%|β | 398M/4.26G [00:21<03:02, 21.2MB/s]
pytorch_model.bin: 10%|β | 409M/4.26G [00:21<03:10, 20.2MB/s]
pytorch_model.bin: 10%|β | 419M/4.26G [00:22<03:16, 19.6MB/s]
pytorch_model.bin: 10%|β | 430M/4.26G [00:22<03:04, 20.8MB/s]
pytorch_model.bin: 10%|β | 440M/4.26G [00:23<03:09, 20.1MB/s]
pytorch_model.bin: 11%|β | 451M/4.26G [00:23<02:58, 21.4MB/s]
pytorch_model.bin: 11%|β | 461M/4.26G [00:24<03:07, 20.2MB/s]
pytorch_model.bin: 11%|β | 472M/4.26G [00:25<03:12, 19.7MB/s]
pytorch_model.bin: 11%|ββ | 482M/4.26G [00:25<03:00, 20.9MB/s]
pytorch_model.bin: 12%|ββ | 493M/4.26G [00:26<03:06, 20.2MB/s]
pytorch_model.bin: 12%|ββ | 503M/4.26G [00:26<02:56, 21.3MB/s]
pytorch_model.bin: 12%|ββ | 514M/4.26G [00:27<03:03, 20.4MB/s]
pytorch_model.bin: 12%|ββ | 524M/4.26G [00:27<03:11, 19.5MB/s]
pytorch_model.bin: 13%|ββ | 535M/4.26G [00:28<02:58, 20.9MB/s]
pytorch_model.bin: 13%|ββ | 545M/4.26G [00:28<03:06, 19.9MB/s]
pytorch_model.bin: 13%|ββ | 556M/4.26G [00:29<03:00, 20.6MB/s]
pytorch_model.bin: 13%|ββ | 566M/4.26G [00:29<03:02, 20.3MB/s]
pytorch_model.bin: 14%|ββ | 577M/4.26G [00:30<03:07, 19.7MB/s]
pytorch_model.bin: 14%|ββ | 587M/4.26G [00:30<02:56, 20.9MB/s]
pytorch_model.bin: 14%|ββ | 598M/4.26G [00:31<03:02, 20.1MB/s]
pytorch_model.bin: 14%|ββ | 608M/4.26G [00:31<02:52, 21.2MB/s]
pytorch_model.bin: 15%|ββ | 619M/4.26G [00:32<02:59, 20.3MB/s]
pytorch_model.bin: 15%|ββ | 629M/4.26G [00:32<03:04, 19.7MB/s]
pytorch_model.bin: 15%|ββ | 640M/4.26G [00:33<02:53, 20.9MB/s]
pytorch_model.bin: 15%|ββ | 650M/4.26G [00:33<03:01, 19.9MB/s]
pytorch_model.bin: 16%|ββ | 661M/4.26G [00:34<03:04, 19.5MB/s]
pytorch_model.bin: 16%|ββ | 671M/4.26G [00:34<02:54, 20.6MB/s]
pytorch_model.bin: 16%|ββ | 682M/4.26G [00:35<02:59, 19.9MB/s]
pytorch_model.bin: 16%|ββ | 692M/4.26G [00:35<02:50, 21.0MB/s]
pytorch_model.bin: 16%|ββ | 703M/4.26G [00:36<02:55, 20.2MB/s]
pytorch_model.bin: 17%|ββ | 713M/4.26G [00:36<03:00, 19.7MB/s]
pytorch_model.bin: 17%|ββ | 724M/4.26G [00:37<02:49, 20.8MB/s]
pytorch_model.bin: 17%|ββ | 734M/4.26G [00:37<02:54, 20.2MB/s]
pytorch_model.bin: 17%|ββ | 744M/4.26G [00:38<02:51, 20.5MB/s]
pytorch_model.bin: 18%|ββ | 755M/4.26G [00:39<02:56, 19.8MB/s]
pytorch_model.bin: 18%|ββ | 765M/4.26G [00:39<02:58, 19.6MB/s]
pytorch_model.bin: 18%|ββ | 776M/4.26G [00:40<02:48, 20.6MB/s]
pytorch_model.bin: 18%|ββ | 786M/4.26G [00:40<02:52, 20.1MB/s]
pytorch_model.bin: 19%|ββ | 797M/4.26G [00:41<02:44, 21.1MB/s]
pytorch_model.bin: 19%|ββ | 807M/4.26G [00:41<02:54, 19.8MB/s]
pytorch_model.bin: 19%|ββ | 818M/4.26G [00:42<02:43, 21.1MB/s]
pytorch_model.bin: 19%|ββ | 828M/4.26G [00:42<02:50, 20.1MB/s]
pytorch_model.bin: 20%|ββ | 839M/4.26G [00:43<02:54, 19.6MB/s]
pytorch_model.bin: 20%|ββ | 849M/4.26G [00:43<02:43, 20.9MB/s]
pytorch_model.bin: 20%|ββ | 860M/4.26G [00:44<02:50, 20.0MB/s]
pytorch_model.bin: 20%|ββ | 870M/4.26G [00:44<02:39, 21.2MB/s]
pytorch_model.bin: 21%|ββ | 881M/4.26G [00:45<02:46, 20.3MB/s]
pytorch_model.bin: 21%|ββ | 891M/4.26G [00:45<02:53, 19.4MB/s]
pytorch_model.bin: 21%|ββ | 902M/4.26G [00:46<02:57, 19.0MB/s]
pytorch_model.bin: 21%|βββ | 912M/4.26G [00:46<02:44, 20.4MB/s]
pytorch_model.bin: 22%|βββ | 923M/4.26G [00:47<02:49, 19.6MB/s]
pytorch_model.bin: 22%|βββ | 933M/4.26G [00:47<02:53, 19.2MB/s]
pytorch_model.bin: 22%|βββ | 944M/4.26G [00:48<02:57, 18.7MB/s]
pytorch_model.bin: 22%|βββ | 954M/4.26G [00:48<02:44, 20.1MB/s]
pytorch_model.bin: 23%|βββ | 965M/4.26G [00:49<02:48, 19.5MB/s]
pytorch_model.bin: 23%|βββ | 975M/4.26G [00:49<02:37, 20.9MB/s]
pytorch_model.bin: 23%|βββ | 986M/4.26G [00:50<02:43, 20.0MB/s]
pytorch_model.bin: 23%|βββ | 996M/4.26G [00:51<02:47, 19.4MB/s]
pytorch_model.bin: 24%|βββ | 1.01G/4.26G [00:51<02:36, 20.8MB/s]
pytorch_model.bin: 24%|βββ | 1.02G/4.26G [00:52<02:43, 19.8MB/s]
pytorch_model.bin: 24%|βββ | 1.03G/4.26G [00:52<02:47, 19.3MB/s]
pytorch_model.bin: 24%|βββ | 1.04G/4.26G [00:53<02:36, 20.6MB/s]
pytorch_model.bin: 25%|βββ | 1.05G/4.26G [00:53<02:42, 19.8MB/s]
pytorch_model.bin: 25%|βββ | 1.06G/4.26G [00:54<02:46, 19.3MB/s]
pytorch_model.bin: 25%|βββ | 1.07G/4.26G [00:54<02:34, 20.6MB/s]
pytorch_model.bin: 25%|βββ | 1.08G/4.26G [00:55<02:40, 19.9MB/s]
pytorch_model.bin: 26%|βββ | 1.09G/4.26G [00:55<02:29, 21.1MB/s]
pytorch_model.bin: 26%|βββ | 1.10G/4.26G [00:56<02:37, 20.0MB/s]
pytorch_model.bin: 26%|βββ | 1.11G/4.26G [00:56<02:27, 21.3MB/s]
pytorch_model.bin: 26%|βββ | 1.12G/4.26G [00:57<02:35, 20.2MB/s]
pytorch_model.bin: 27%|βββ | 1.13G/4.26G [00:57<02:39, 19.6MB/s]
pytorch_model.bin: 27%|βββ | 1.14G/4.26G [00:58<02:28, 20.9MB/s]
pytorch_model.bin: 27%|βββ | 1.15G/4.26G [00:58<02:35, 20.0MB/s]
pytorch_model.bin: 27%|βββ | 1.16G/4.26G [00:59<02:39, 19.4MB/s]
pytorch_model.bin: 28%|βββ | 1.17G/4.26G [00:59<02:29, 20.6MB/s]
pytorch_model.bin: 28%|βββ | 1.18G/4.26G [01:00<02:34, 19.9MB/s]
pytorch_model.bin: 28%|βββ | 1.20G/4.26G [01:00<02:26, 20.9MB/s]
pytorch_model.bin: 28%|βββ | 1.21G/4.26G [01:01<02:30, 20.2MB/s]
pytorch_model.bin: 29%|βββ | 1.22G/4.26G [01:02<02:36, 19.5MB/s]
pytorch_model.bin: 29%|βββ | 1.23G/4.26G [01:02<02:39, 19.0MB/s]
pytorch_model.bin: 29%|βββ | 1.24G/4.26G [01:03<02:28, 20.3MB/s]
pytorch_model.bin: 29%|βββ | 1.25G/4.26G [01:03<02:32, 19.8MB/s]
pytorch_model.bin: 30%|βββ | 1.26G/4.26G [01:04<02:36, 19.2MB/s]
pytorch_model.bin: 30%|βββ | 1.27G/4.26G [01:04<02:39, 18.8MB/s]
pytorch_model.bin: 30%|βββ | 1.28G/4.26G [01:05<02:26, 20.3MB/s]
pytorch_model.bin: 30%|βββ | 1.29G/4.26G [01:05<02:31, 19.6MB/s]
pytorch_model.bin: 31%|βββ | 1.30G/4.26G [01:06<02:35, 19.1MB/s]
pytorch_model.bin: 31%|βββ | 1.31G/4.26G [01:06<02:37, 18.7MB/s]
pytorch_model.bin: 31%|βββ | 1.32G/4.26G [01:07<02:39, 18.5MB/s]
pytorch_model.bin: 31%|ββββ | 1.33G/4.26G [01:08<02:44, 17.8MB/s]
pytorch_model.bin: 31%|ββββ | 1.34G/4.26G [01:08<02:43, 17.9MB/s]
pytorch_model.bin: 32%|ββββ | 1.35G/4.26G [01:09<02:29, 19.5MB/s]
pytorch_model.bin: 32%|ββββ | 1.36G/4.26G [01:09<02:32, 19.0MB/s]
pytorch_model.bin: 32%|ββββ | 1.37G/4.26G [01:10<02:33, 18.8MB/s]
pytorch_model.bin: 32%|ββββ | 1.38G/4.26G [01:10<02:21, 20.4MB/s]
pytorch_model.bin: 33%|ββββ | 1.39G/4.26G [01:11<02:25, 19.7MB/s]
pytorch_model.bin: 33%|ββββ | 1.41G/4.26G [01:11<02:15, 21.0MB/s]
pytorch_model.bin: 33%|ββββ | 1.42G/4.26G [01:12<02:21, 20.1MB/s]
pytorch_model.bin: 33%|ββββ | 1.43G/4.26G [01:12<02:27, 19.2MB/s]
pytorch_model.bin: 34%|ββββ | 1.44G/4.26G [01:13<02:17, 20.5MB/s]
pytorch_model.bin: 34%|ββββ | 1.45G/4.26G [01:13<02:22, 19.7MB/s]
pytorch_model.bin: 34%|ββββ | 1.46G/4.26G [01:14<02:25, 19.3MB/s]
pytorch_model.bin: 34%|ββββ | 1.47G/4.26G [01:14<02:14, 20.7MB/s]
pytorch_model.bin: 35%|ββββ | 1.48G/4.26G [01:15<02:19, 19.9MB/s]
pytorch_model.bin: 35%|ββββ | 1.49G/4.26G [01:16<02:23, 19.3MB/s]
pytorch_model.bin: 35%|ββββ | 1.50G/4.26G [01:16<02:27, 18.8MB/s]
pytorch_model.bin: 35%|ββββ | 1.51G/4.26G [01:17<02:16, 20.2MB/s]
pytorch_model.bin: 36%|ββββ | 1.52G/4.26G [01:17<02:20, 19.5MB/s]
pytorch_model.bin: 36%|ββββ | 1.53G/4.26G [01:18<02:10, 21.0MB/s]
pytorch_model.bin: 36%|ββββ | 1.54G/4.26G [01:18<02:15, 20.1MB/s]
pytorch_model.bin: 36%|ββββ | 1.55G/4.26G [01:19<02:09, 21.0MB/s]
pytorch_model.bin: 37%|ββββ | 1.56G/4.26G [01:19<02:14, 20.1MB/s]
pytorch_model.bin: 37%|ββββ | 1.57G/4.26G [01:20<02:16, 19.8MB/s]
pytorch_model.bin: 37%|ββββ | 1.58G/4.26G [01:20<02:08, 20.8MB/s]
pytorch_model.bin: 37%|ββββ | 1.59G/4.26G [01:21<02:12, 20.1MB/s]
pytorch_model.bin: 38%|ββββ | 1.60G/4.26G [01:21<02:15, 19.6MB/s]
pytorch_model.bin: 38%|ββββ | 1.61G/4.26G [01:22<02:07, 20.7MB/s]
pytorch_model.bin: 38%|ββββ | 1.63G/4.26G [01:22<02:10, 20.1MB/s]
pytorch_model.bin: 38%|ββββ | 1.64G/4.26G [01:23<02:04, 21.0MB/s]
pytorch_model.bin: 39%|ββββ | 1.65G/4.26G [01:23<02:08, 20.4MB/s]
pytorch_model.bin: 39%|ββββ | 1.66G/4.26G [01:24<02:03, 21.2MB/s]
pytorch_model.bin: 39%|ββββ | 1.67G/4.26G [01:24<02:08, 20.1MB/s]
pytorch_model.bin: 39%|ββββ | 1.68G/4.26G [01:25<02:12, 19.5MB/s]
pytorch_model.bin: 40%|ββββ | 1.69G/4.26G [01:25<02:04, 20.7MB/s]
pytorch_model.bin: 40%|ββββ | 1.70G/4.26G [01:26<02:08, 19.9MB/s]
pytorch_model.bin: 40%|ββββ | 1.71G/4.26G [01:27<02:12, 19.3MB/s]
pytorch_model.bin: 40%|ββββ | 1.72G/4.26G [01:27<02:03, 20.6MB/s]
pytorch_model.bin: 41%|ββββ | 1.73G/4.26G [01:28<02:07, 19.9MB/s]
pytorch_model.bin: 41%|ββββ | 1.74G/4.26G [01:28<02:02, 20.6MB/s]
pytorch_model.bin: 41%|ββββ | 1.75G/4.26G [01:28<02:03, 20.3MB/s]
pytorch_model.bin: 41%|βββββ | 1.76G/4.26G [01:29<02:07, 19.6MB/s]
pytorch_model.bin: 42%|βββββ | 1.77G/4.26G [01:30<01:59, 20.9MB/s]
pytorch_model.bin: 42%|βββββ | 1.78G/4.26G [01:30<02:04, 19.9MB/s]
pytorch_model.bin: 42%|βββββ | 1.79G/4.26G [01:31<02:08, 19.3MB/s]
pytorch_model.bin: 42%|βββββ | 1.80G/4.26G [01:31<02:09, 18.9MB/s]
pytorch_model.bin: 43%|βββββ | 1.81G/4.26G [01:32<02:11, 18.6MB/s]
pytorch_model.bin: 43%|βββββ | 1.82G/4.26G [01:32<02:01, 20.1MB/s]
pytorch_model.bin: 43%|βββββ | 1.84G/4.26G [01:33<02:05, 19.4MB/s]
pytorch_model.bin: 43%|βββββ | 1.85G/4.26G [01:33<02:07, 19.0MB/s]
pytorch_model.bin: 44%|βββββ | 1.86G/4.26G [01:34<01:57, 20.5MB/s]
pytorch_model.bin: 44%|βββββ | 1.87G/4.26G [01:34<02:01, 19.7MB/s]
pytorch_model.bin: 44%|βββββ | 1.88G/4.26G [01:35<02:04, 19.2MB/s]
pytorch_model.bin: 44%|βββββ | 1.89G/4.26G [01:35<01:55, 20.6MB/s]
pytorch_model.bin: 45%|βββββ | 1.90G/4.26G [01:36<01:58, 19.9MB/s]
pytorch_model.bin: 45%|βββββ | 1.91G/4.26G [01:36<01:51, 21.2MB/s]
pytorch_model.bin: 45%|βββββ | 1.92G/4.26G [01:37<01:55, 20.2MB/s]
pytorch_model.bin: 45%|βββββ | 1.93G/4.26G [01:37<01:49, 21.3MB/s]
pytorch_model.bin: 46%|βββββ | 1.94G/4.26G [01:38<01:54, 20.3MB/s]
pytorch_model.bin: 46%|βββββ | 1.95G/4.26G [01:39<01:57, 19.7MB/s]
pytorch_model.bin: 46%|βββββ | 1.96G/4.26G [01:39<01:49, 20.9MB/s]
pytorch_model.bin: 46%|βββββ | 1.97G/4.26G [01:40<01:54, 20.0MB/s]
pytorch_model.bin: 47%|βββββ | 1.98G/4.26G [01:40<01:57, 19.4MB/s]
pytorch_model.bin: 47%|βββββ | 1.99G/4.26G [01:41<02:01, 18.7MB/s]
pytorch_model.bin: 47%|βββββ | 2.00G/4.26G [01:41<01:51, 20.2MB/s]
pytorch_model.bin: 47%|βββββ | 2.01G/4.26G [01:42<01:54, 19.6MB/s]
pytorch_model.bin: 47%|βββββ | 2.02G/4.26G [01:42<01:57, 19.1MB/s]
pytorch_model.bin: 48%|βββββ | 2.03G/4.26G [01:43<01:48, 20.5MB/s]
pytorch_model.bin: 48%|βββββ | 2.04G/4.26G [01:43<01:52, 19.8MB/s]
pytorch_model.bin: 48%|βββββ | 2.06G/4.26G [01:44<01:54, 19.2MB/s]
pytorch_model.bin: 48%|βββββ | 2.07G/4.26G [01:44<01:46, 20.6MB/s]
pytorch_model.bin: 49%|βββββ | 2.08G/4.26G [01:45<01:49, 19.9MB/s]
pytorch_model.bin: 49%|βββββ | 2.09G/4.26G [01:45<01:42, 21.2MB/s]
pytorch_model.bin: 49%|βββββ | 2.10G/4.26G [01:46<01:46, 20.3MB/s]
pytorch_model.bin: 49%|βββββ | 2.11G/4.26G [01:46<01:40, 21.4MB/s]
pytorch_model.bin: 50%|βββββ | 2.12G/4.26G [01:47<01:45, 20.3MB/s]
pytorch_model.bin: 50%|βββββ | 2.13G/4.26G [01:47<01:47, 19.8MB/s]
pytorch_model.bin: 50%|βββββ | 2.14G/4.26G [01:48<01:40, 21.1MB/s]
pytorch_model.bin: 50%|βββββ | 2.15G/4.26G [01:48<01:44, 20.2MB/s]
pytorch_model.bin: 51%|βββββ | 2.16G/4.26G [01:49<01:47, 19.5MB/s]
pytorch_model.bin: 51%|βββββ | 2.17G/4.26G [01:49<01:40, 20.9MB/s]
pytorch_model.bin: 51%|βββββ | 2.18G/4.26G [01:50<01:44, 20.0MB/s]
pytorch_model.bin: 51%|ββββββ | 2.19G/4.26G [01:50<01:37, 21.3MB/s]
pytorch_model.bin: 52%|ββββββ | 2.20G/4.26G [01:51<01:41, 20.3MB/s]
pytorch_model.bin: 52%|ββββββ | 2.21G/4.26G [01:52<01:44, 19.6MB/s]
pytorch_model.bin: 52%|ββββββ | 2.22G/4.26G [01:52<01:37, 21.0MB/s]
pytorch_model.bin: 52%|ββββββ | 2.23G/4.26G [01:53<01:41, 20.1MB/s]
pytorch_model.bin: 53%|ββββββ | 2.24G/4.26G [01:53<01:34, 21.3MB/s]
pytorch_model.bin: 53%|ββββββ | 2.25G/4.26G [01:54<01:38, 20.3MB/s]
pytorch_model.bin: 53%|ββββββ | 2.26G/4.26G [01:54<01:33, 21.5MB/s]
pytorch_model.bin: 53%|ββββββ | 2.28G/4.26G [01:55<01:37, 20.5MB/s]
pytorch_model.bin: 54%|ββββββ | 2.29G/4.26G [01:55<01:39, 19.8MB/s]
pytorch_model.bin: 54%|ββββββ | 2.30G/4.26G [01:56<01:33, 21.0MB/s]
pytorch_model.bin: 54%|ββββββ | 2.31G/4.26G [01:56<01:37, 20.1MB/s]
pytorch_model.bin: 54%|ββββββ | 2.32G/4.26G [01:57<01:32, 21.1MB/s]
pytorch_model.bin: 55%|ββββββ | 2.33G/4.26G [01:57<01:35, 20.3MB/s]
pytorch_model.bin: 55%|ββββββ | 2.34G/4.26G [01:58<01:38, 19.6MB/s]
pytorch_model.bin: 55%|ββββββ | 2.35G/4.26G [01:58<01:40, 19.0MB/s]
pytorch_model.bin: 55%|ββββββ | 2.36G/4.26G [01:59<01:33, 20.4MB/s]
pytorch_model.bin: 56%|ββββββ | 2.37G/4.26G [01:59<01:35, 19.8MB/s]
pytorch_model.bin: 56%|ββββββ | 2.38G/4.26G [02:00<01:37, 19.3MB/s]
pytorch_model.bin: 56%|ββββββ | 2.39G/4.26G [02:00<01:30, 20.7MB/s]
pytorch_model.bin: 56%|ββββββ | 2.40G/4.26G [02:01<01:34, 19.8MB/s]
pytorch_model.bin: 57%|ββββββ | 2.41G/4.26G [02:01<01:36, 19.2MB/s]
pytorch_model.bin: 57%|ββββββ | 2.42G/4.26G [02:02<01:37, 18.8MB/s]
pytorch_model.bin: 57%|ββββββ | 2.43G/4.26G [02:03<01:38, 18.6MB/s]
pytorch_model.bin: 57%|ββββββ | 2.44G/4.26G [02:03<01:31, 19.9MB/s]
pytorch_model.bin: 58%|ββββββ | 2.45G/4.26G [02:04<01:32, 19.4MB/s]
pytorch_model.bin: 58%|ββββββ | 2.46G/4.26G [02:04<01:34, 19.0MB/s]
pytorch_model.bin: 58%|ββββββ | 2.47G/4.26G [02:05<01:37, 18.4MB/s]
pytorch_model.bin: 58%|ββββββ | 2.49G/4.26G [02:05<01:36, 18.5MB/s]
pytorch_model.bin: 59%|ββββββ | 2.50G/4.26G [02:06<01:36, 18.4MB/s]
pytorch_model.bin: 59%|ββββββ | 2.51G/4.26G [02:06<01:28, 19.9MB/s]
pytorch_model.bin: 59%|ββββββ | 2.52G/4.26G [02:07<01:30, 19.3MB/s]
pytorch_model.bin: 59%|ββββββ | 2.53G/4.26G [02:08<01:31, 18.9MB/s]
pytorch_model.bin: 60%|ββββββ | 2.54G/4.26G [02:08<01:32, 18.6MB/s]
pytorch_model.bin: 60%|ββββββ | 2.55G/4.26G [02:09<01:24, 20.2MB/s]
pytorch_model.bin: 60%|ββββββ | 2.56G/4.26G [02:09<01:27, 19.5MB/s]
pytorch_model.bin: 60%|ββββββ | 2.57G/4.26G [02:10<01:28, 19.1MB/s]
pytorch_model.bin: 61%|ββββββ | 2.58G/4.26G [02:10<01:22, 20.5MB/s]
pytorch_model.bin: 61%|ββββββ | 2.59G/4.26G [02:11<01:25, 19.6MB/s]
pytorch_model.bin: 61%|ββββββ | 2.60G/4.26G [02:11<01:20, 20.7MB/s]
pytorch_model.bin: 61%|βββββββ | 2.61G/4.26G [02:12<01:22, 19.9MB/s]
pytorch_model.bin: 62%|βββββββ | 2.62G/4.26G [02:12<01:22, 19.8MB/s]
pytorch_model.bin: 62%|βββββββ | 2.63G/4.26G [02:13<01:18, 20.6MB/s]
pytorch_model.bin: 62%|βββββββ | 2.64G/4.26G [02:13<01:19, 20.2MB/s]
pytorch_model.bin: 62%|βββββββ | 2.65G/4.26G [02:14<01:16, 20.9MB/s]
pytorch_model.bin: 63%|βββββββ | 2.66G/4.26G [02:14<01:19, 20.0MB/s]
pytorch_model.bin: 63%|βββββββ | 2.67G/4.26G [02:15<01:19, 19.9MB/s]
pytorch_model.bin: 63%|βββββββ | 2.68G/4.26G [02:15<01:16, 20.7MB/s]
pytorch_model.bin: 63%|βββββββ | 2.69G/4.26G [02:16<01:17, 20.3MB/s]
pytorch_model.bin: 63%|βββββββ | 2.71G/4.26G [02:16<01:15, 20.5MB/s]
pytorch_model.bin: 64%|βββββββ | 2.72G/4.26G [02:17<01:16, 20.1MB/s]
pytorch_model.bin: 64%|βββββββ | 2.73G/4.26G [02:17<01:18, 19.5MB/s]
pytorch_model.bin: 64%|βββββββ | 2.74G/4.26G [02:18<01:13, 20.9MB/s]
pytorch_model.bin: 64%|βββββββ | 2.75G/4.26G [02:18<01:16, 19.9MB/s]
pytorch_model.bin: 65%|βββββββ | 2.76G/4.26G [02:19<01:19, 19.0MB/s]
pytorch_model.bin: 65%|βββββββ | 2.77G/4.26G [02:20<01:18, 19.0MB/s]
pytorch_model.bin: 65%|βββββββ | 2.78G/4.26G [02:20<01:19, 18.7MB/s]
pytorch_model.bin: 65%|βββββββ | 2.79G/4.26G [02:21<01:13, 20.2MB/s]
pytorch_model.bin: 66%|βββββββ | 2.80G/4.26G [02:21<01:15, 19.4MB/s]
pytorch_model.bin: 66%|βββββββ | 2.81G/4.26G [02:22<01:17, 18.8MB/s]
pytorch_model.bin: 66%|βββββββ | 2.82G/4.26G [02:22<01:16, 18.7MB/s]
pytorch_model.bin: 66%|βββββββ | 2.83G/4.26G [02:23<01:17, 18.5MB/s]
pytorch_model.bin: 67%|βββββββ | 2.84G/4.26G [02:23<01:11, 19.9MB/s]
pytorch_model.bin: 67%|βββββββ | 2.85G/4.26G [02:24<01:12, 19.5MB/s]
pytorch_model.bin: 67%|βββββββ | 2.86G/4.26G [02:24<01:07, 20.8MB/s]
pytorch_model.bin: 67%|βββββββ | 2.87G/4.26G [02:25<01:09, 20.0MB/s]
pytorch_model.bin: 68%|βββββββ | 2.88G/4.26G [02:26<01:11, 19.2MB/s]
pytorch_model.bin: 68%|βββββββ | 2.89G/4.26G [02:26<01:12, 18.9MB/s]
pytorch_model.bin: 68%|βββββββ | 2.90G/4.26G [02:27<01:07, 20.2MB/s]
pytorch_model.bin: 68%|βββββββ | 2.92G/4.26G [02:27<01:08, 19.6MB/s]
pytorch_model.bin: 69%|βββββββ | 2.93G/4.26G [02:28<01:09, 19.3MB/s]
pytorch_model.bin: 69%|βββββββ | 2.94G/4.26G [02:28<01:04, 20.7MB/s]
pytorch_model.bin: 69%|βββββββ | 2.95G/4.26G [02:29<01:06, 19.9MB/s]
pytorch_model.bin: 69%|βββββββ | 2.96G/4.26G [02:29<01:01, 21.2MB/s]
pytorch_model.bin: 70%|βββββββ | 2.97G/4.26G [02:30<01:03, 20.3MB/s]
pytorch_model.bin: 70%|βββββββ | 2.98G/4.26G [02:30<01:05, 19.5MB/s]
pytorch_model.bin: 70%|βββββββ | 2.99G/4.26G [02:31<01:01, 20.8MB/s]
pytorch_model.bin: 70%|βββββββ | 3.00G/4.26G [02:31<01:03, 19.8MB/s]
pytorch_model.bin: 71%|βββββββ | 3.01G/4.26G [02:32<01:04, 19.5MB/s]
pytorch_model.bin: 71%|βββββββ | 3.02G/4.26G [02:32<01:00, 20.5MB/s]
pytorch_model.bin: 71%|βββββββ | 3.03G/4.26G [02:33<01:02, 19.7MB/s]
pytorch_model.bin: 71%|ββββββββ | 3.04G/4.26G [02:33<01:02, 19.4MB/s]
pytorch_model.bin: 72%|ββββββββ | 3.05G/4.26G [02:34<00:58, 20.5MB/s]
pytorch_model.bin: 72%|ββββββββ | 3.06G/4.26G [02:34<00:59, 20.1MB/s]
pytorch_model.bin: 72%|ββββββββ | 3.07G/4.26G [02:35<01:01, 19.3MB/s]
pytorch_model.bin: 72%|ββββββββ | 3.08G/4.26G [02:36<01:02, 18.9MB/s]
pytorch_model.bin: 73%|ββββββββ | 3.09G/4.26G [02:36<00:57, 20.4MB/s]
pytorch_model.bin: 73%|ββββββββ | 3.10G/4.26G [02:37<00:59, 19.5MB/s]
pytorch_model.bin: 73%|ββββββββ | 3.11G/4.26G [02:37<00:59, 19.2MB/s]
pytorch_model.bin: 73%|ββββββββ | 3.12G/4.26G [02:38<01:00, 18.8MB/s]
pytorch_model.bin: 74%|ββββββββ | 3.14G/4.26G [02:38<00:55, 20.3MB/s]
pytorch_model.bin: 74%|ββββββββ | 3.15G/4.26G [02:39<00:57, 19.5MB/s]
pytorch_model.bin: 74%|ββββββββ | 3.16G/4.26G [02:39<00:58, 19.0MB/s]
pytorch_model.bin: 74%|ββββββββ | 3.17G/4.26G [02:40<00:58, 18.7MB/s]
pytorch_model.bin: 75%|ββββββββ | 3.18G/4.26G [02:41<01:00, 17.9MB/s]
pytorch_model.bin: 75%|ββββββββ | 3.19G/4.26G [02:41<00:55, 19.4MB/s]
pytorch_model.bin: 75%|ββββββββ | 3.20G/4.26G [02:42<00:56, 18.7MB/s]
pytorch_model.bin: 75%|ββββββββ | 3.21G/4.26G [02:42<00:51, 20.6MB/s]
pytorch_model.bin: 76%|ββββββββ | 3.22G/4.26G [02:43<00:52, 19.7MB/s]
pytorch_model.bin: 76%|ββββββββ | 3.23G/4.26G [02:43<00:53, 19.3MB/s]
pytorch_model.bin: 76%|ββββββββ | 3.24G/4.26G [02:44<00:49, 20.5MB/s]
pytorch_model.bin: 76%|ββββββββ | 3.25G/4.26G [02:44<00:50, 19.9MB/s]
pytorch_model.bin: 77%|ββββββββ | 3.26G/4.26G [02:45<00:47, 21.0MB/s]
pytorch_model.bin: 77%|ββββββββ | 3.27G/4.26G [02:45<00:49, 20.0MB/s]
pytorch_model.bin: 77%|ββββββββ | 3.28G/4.26G [02:46<00:49, 19.6MB/s]
pytorch_model.bin: 77%|ββββββββ | 3.29G/4.26G [02:46<00:46, 20.9MB/s]
pytorch_model.bin: 78%|ββββββββ | 3.30G/4.26G [02:47<00:47, 20.0MB/s]
pytorch_model.bin: 78%|ββββββββ | 3.31G/4.26G [02:47<00:45, 21.0MB/s]
pytorch_model.bin: 78%|ββββββββ | 3.32G/4.26G [02:48<00:46, 20.2MB/s]
pytorch_model.bin: 78%|ββββββββ | 3.33G/4.26G [02:48<00:47, 19.5MB/s]
pytorch_model.bin: 78%|ββββββββ | 3.34G/4.26G [02:49<00:45, 20.2MB/s]
pytorch_model.bin: 79%|ββββββββ | 3.36G/4.26G [02:49<00:45, 20.0MB/s]
pytorch_model.bin: 79%|ββββββββ | 3.37G/4.26G [02:50<00:46, 19.4MB/s]
pytorch_model.bin: 79%|ββββββββ | 3.38G/4.26G [02:50<00:43, 20.2MB/s]
pytorch_model.bin: 79%|ββββββββ | 3.39G/4.26G [02:51<00:43, 20.0MB/s]
pytorch_model.bin: 80%|ββββββββ | 3.40G/4.26G [02:51<00:41, 20.8MB/s]
pytorch_model.bin: 80%|ββββββββ | 3.41G/4.26G [02:52<00:43, 19.8MB/s]
pytorch_model.bin: 80%|ββββββββ | 3.42G/4.26G [02:52<00:40, 20.9MB/s]
pytorch_model.bin: 80%|ββββββββ | 3.43G/4.26G [02:53<00:41, 19.9MB/s]
pytorch_model.bin: 81%|ββββββββ | 3.44G/4.26G [02:54<00:41, 19.7MB/s]
pytorch_model.bin: 81%|ββββββββ | 3.45G/4.26G [02:54<00:39, 20.3MB/s]
pytorch_model.bin: 81%|ββββββββ | 3.46G/4.26G [02:55<00:39, 20.2MB/s]
pytorch_model.bin: 81%|βββββββββ | 3.47G/4.26G [02:55<00:36, 21.4MB/s]
pytorch_model.bin: 82%|βββββββββ | 3.48G/4.26G [02:56<00:38, 20.4MB/s]
pytorch_model.bin: 82%|βββββββββ | 3.49G/4.26G [02:56<00:39, 19.7MB/s]
pytorch_model.bin: 82%|βββββββββ | 3.50G/4.26G [02:57<00:36, 20.8MB/s]
pytorch_model.bin: 82%|βββββββββ | 3.51G/4.26G [02:57<00:41, 18.0MB/s]
pytorch_model.bin: 83%|βββββββββ | 3.52G/4.26G [02:58<00:42, 17.3MB/s]
pytorch_model.bin: 83%|βββββββββ | 3.53G/4.26G [02:59<00:51, 14.0MB/s]
pytorch_model.bin: 83%|βββββββββ | 3.54G/4.26G [03:00<00:54, 13.1MB/s]
pytorch_model.bin: 83%|βββββββββ | 3.55G/4.26G [03:01<00:53, 13.2MB/s]
pytorch_model.bin: 84%|βββββββββ | 3.57G/4.26G [03:02<00:52, 13.3MB/s]
pytorch_model.bin: 84%|βββββββββ | 3.58G/4.26G [03:02<00:48, 14.3MB/s]
pytorch_model.bin: 84%|βββββββββ | 3.59G/4.26G [03:03<00:48, 14.0MB/s]
pytorch_model.bin: 84%|βββββββββ | 3.60G/4.26G [03:04<00:48, 13.7MB/s]
pytorch_model.bin: 85%|βββββββββ | 3.61G/4.26G [03:05<00:49, 13.2MB/s]
pytorch_model.bin: 85%|βββββββββ | 3.62G/4.26G [03:06<00:51, 12.6MB/s]
pytorch_model.bin: 85%|βββββββββ | 3.63G/4.26G [03:06<00:50, 12.5MB/s]
pytorch_model.bin: 85%|βββββββββ | 3.64G/4.26G [03:07<00:50, 12.4MB/s]
pytorch_model.bin: 86%|βββββββββ | 3.65G/4.26G [03:08<00:49, 12.3MB/s]
pytorch_model.bin: 86%|βββββββββ | 3.66G/4.26G [03:09<00:44, 13.4MB/s]
pytorch_model.bin: 86%|βββββββββ | 3.67G/4.26G [03:09<00:40, 14.8MB/s]
pytorch_model.bin: 86%|βββββββββ | 3.68G/4.26G [03:10<00:36, 16.0MB/s]
pytorch_model.bin: 87%|βββββββββ | 3.69G/4.26G [03:10<00:35, 15.9MB/s]
pytorch_model.bin: 87%|βββββββββ | 3.70G/4.26G [03:11<00:36, 15.2MB/s]
pytorch_model.bin: 87%|βββββββββ | 3.71G/4.26G [03:12<00:37, 14.5MB/s]
pytorch_model.bin: 87%|βββββββββ | 3.72G/4.26G [03:13<00:41, 13.0MB/s]
pytorch_model.bin: 88%|βββββββββ | 3.73G/4.26G [03:14<00:39, 13.4MB/s]
pytorch_model.bin: 88%|βββββββββ | 3.74G/4.26G [03:15<00:38, 13.6MB/s]
pytorch_model.bin: 88%|βββββββββ | 3.75G/4.26G [03:15<00:38, 13.3MB/s]
pytorch_model.bin: 88%|βββββββββ | 3.76G/4.26G [03:16<00:37, 13.4MB/s]
pytorch_model.bin: 89%|βββββββββ | 3.77G/4.26G [03:17<00:33, 14.3MB/s]
pytorch_model.bin: 89%|βββββββββ | 3.79G/4.26G [03:17<00:32, 14.8MB/s]
pytorch_model.bin: 89%|βββββββββ | 3.80G/4.26G [03:18<00:29, 15.7MB/s]
pytorch_model.bin: 89%|βββββββββ | 3.81G/4.26G [03:19<00:33, 13.6MB/s]
pytorch_model.bin: 90%|βββββββββ | 3.82G/4.26G [03:20<00:34, 13.0MB/s]
pytorch_model.bin: 90%|βββββββββ | 3.83G/4.26G [03:21<00:33, 12.9MB/s]
pytorch_model.bin: 90%|βββββββββ | 3.84G/4.26G [03:22<00:35, 12.0MB/s]
pytorch_model.bin: 90%|βββββββββ | 3.85G/4.26G [03:23<00:36, 11.2MB/s]
pytorch_model.bin: 91%|βββββββββ | 3.86G/4.26G [03:24<00:37, 10.6MB/s]
pytorch_model.bin: 91%|βββββββββ | 3.87G/4.26G [03:25<00:35, 11.2MB/s]
pytorch_model.bin: 91%|βββββββββ | 3.88G/4.26G [03:25<00:28, 13.5MB/s]
pytorch_model.bin: 91%|ββββββββββ| 3.89G/4.26G [03:25<00:22, 16.2MB/s]
pytorch_model.bin: 92%|ββββββββββ| 3.90G/4.26G [03:26<00:19, 18.1MB/s]
pytorch_model.bin: 92%|ββββββββββ| 3.91G/4.26G [03:26<00:18, 18.7MB/s]
pytorch_model.bin: 92%|ββββββββββ| 3.92G/4.26G [03:27<00:17, 19.9MB/s]
pytorch_model.bin: 92%|ββββββββββ| 3.93G/4.26G [03:27<00:16, 19.5MB/s]
pytorch_model.bin: 93%|ββββββββββ| 3.94G/4.26G [03:28<00:16, 19.1MB/s]
pytorch_model.bin: 93%|ββββββββββ| 3.95G/4.26G [03:29<00:16, 18.8MB/s]
pytorch_model.bin: 93%|ββββββββββ| 3.96G/4.26G [03:29<00:14, 20.2MB/s]
pytorch_model.bin: 93%|ββββββββββ| 3.97G/4.26G [03:30<00:14, 19.6MB/s]
pytorch_model.bin: 94%|ββββββββββ| 3.98G/4.26G [03:30<00:13, 20.6MB/s]
pytorch_model.bin: 94%|ββββββββββ| 4.00G/4.26G [03:31<00:13, 19.8MB/s]
pytorch_model.bin: 94%|ββββββββββ| 4.01G/4.26G [03:31<00:12, 20.7MB/s]
pytorch_model.bin: 94%|ββββββββββ| 4.02G/4.26G [03:32<00:12, 20.3MB/s]
pytorch_model.bin: 94%|ββββββββββ| 4.03G/4.26G [03:32<00:11, 19.6MB/s]
pytorch_model.bin: 95%|ββββββββββ| 4.04G/4.26G [03:33<00:10, 20.6MB/s]
pytorch_model.bin: 95%|ββββββββββ| 4.05G/4.26G [03:33<00:10, 19.8MB/s]
pytorch_model.bin: 95%|ββββββββββ| 4.06G/4.26G [03:34<00:10, 19.6MB/s]
pytorch_model.bin: 95%|ββββββββββ| 4.07G/4.26G [03:34<00:09, 20.5MB/s]
pytorch_model.bin: 96%|ββββββββββ| 4.08G/4.26G [03:35<00:09, 19.9MB/s]
pytorch_model.bin: 96%|ββββββββββ| 4.09G/4.26G [03:35<00:08, 19.4MB/s]
pytorch_model.bin: 96%|ββββββββββ| 4.10G/4.26G [03:36<00:08, 18.7MB/s]
pytorch_model.bin: 96%|ββββββββββ| 4.11G/4.26G [03:36<00:08, 18.7MB/s]
pytorch_model.bin: 97%|ββββββββββ| 4.12G/4.26G [03:37<00:07, 18.8MB/s]
pytorch_model.bin: 97%|ββββββββββ| 4.13G/4.26G [03:38<00:06, 18.7MB/s]
pytorch_model.bin: 97%|ββββββββββ| 4.14G/4.26G [03:38<00:06, 19.9MB/s]
pytorch_model.bin: 97%|ββββββββββ| 4.15G/4.26G [03:39<00:05, 20.5MB/s]
pytorch_model.bin: 98%|ββββββββββ| 4.16G/4.26G [03:39<00:04, 19.9MB/s]
pytorch_model.bin: 98%|ββββββββββ| 4.17G/4.26G [03:40<00:04, 19.4MB/s]
pytorch_model.bin: 98%|ββββββββββ| 4.18G/4.26G [03:40<00:03, 20.7MB/s]
pytorch_model.bin: 98%|ββββββββββ| 4.19G/4.26G [03:41<00:03, 19.8MB/s]
pytorch_model.bin: 99%|ββββββββββ| 4.20G/4.26G [03:41<00:02, 19.3MB/s]
pytorch_model.bin: 99%|ββββββββββ| 4.22G/4.26G [03:42<00:02, 20.6MB/s]
pytorch_model.bin: 99%|ββββββββββ| 4.23G/4.26G [03:42<00:01, 19.8MB/s]
pytorch_model.bin: 99%|ββββββββββ| 4.24G/4.26G [03:43<00:01, 19.2MB/s]
pytorch_model.bin: 100%|ββββββββββ| 4.25G/4.26G [03:43<00:00, 19.8MB/s]
pytorch_model.bin: 100%|ββββββββββ| 4.26G/4.26G [03:44<00:00, 19.7MB/s]
pytorch_model.bin: 100%|ββββββββββ| 4.26G/4.26G [03:44<00:00, 19.5MB/s]
pytorch_model.bin: 100%|ββββββββββ| 4.26G/4.26G [03:44<00:00, 19.0MB/s]
[INFO|modeling_utils.py:3196] 2023-12-06 03:24:59,733 >> loading weights file pytorch_model.bin from cache at /home/hf/.cache/huggingface/hub/models--ckip-joint--bloom-1b1-zh/snapshots/60bed206f673a412c57651456f8c2cf642cdfcfe/pytorch_model.bin
[INFO|modeling_utils.py:3302] 2023-12-06 03:25:00,795 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model
[INFO|modeling_utils.py:4034] 2023-12-06 03:25:01,921 >> All model checkpoint weights were used when initializing BloomForSequenceClassification.
[INFO|modeling_utils.py:4042] 2023-12-06 03:25:01,921 >> All the weights of BloomForSequenceClassification were initialized from the model checkpoint at ckip-joint/bloom-1b1-zh.
If your task is similar to the task the model of the checkpoint was trained on, you can already use BloomForSequenceClassification for predictions without further training.
Running tokenizer on dataset: 0%| | 0/4635 [00:00<?, ? examples/s]Caching processed dataset at /home/hf/.cache/huggingface/datasets/csv/default-8f347103001581ec/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d/cache-3d872eada15ea9fd.arrow
Running tokenizer on dataset: 100%|ββββββββββ| 4635/4635 [00:00<00:00, 42887.77 examples/s]
Running tokenizer on dataset: 100%|ββββββββββ| 4635/4635 [00:00<00:00, 42266.86 examples/s]
Running tokenizer on dataset: 0%| | 0/18 [00:00<?, ? examples/s]Caching processed dataset at /home/hf/.cache/huggingface/datasets/csv/default-8f347103001581ec/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d/cache-e94ca5703f777eb9.arrow
Running tokenizer on dataset: 100%|ββββββββββ| 18/18 [00:00<00:00, 5167.17 examples/s]
Downloading builder script: 0%| | 0.00/4.20k [00:00<?, ?B/s]
Downloading builder script: 100%|ββββββββββ| 4.20k/4.20k [00:00<00:00, 9.52MB/s]
[INFO|trainer.py:567] 2023-12-06 03:25:04,457 >> Using auto half precision backend
[INFO|trainer.py:712] 2023-12-06 03:25:04,527 >> The following columns in the training set don't have a corresponding argument in `BloomForSequenceClassification.forward` and have been ignored: user, sentence. If user, sentence are not expected by `BloomForSequenceClassification.forward`, you can safely ignore this message.
Traceback (most recent call last):
File "/workspaces/hf/./script/run_classification.py", line 777, in <module>
main()
File "/workspaces/hf/./script/run_classification.py", line 712, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1533, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1614, in _inner_training_loop
self.optimizer, self.lr_scheduler = deepspeed_init(self, num_training_steps=max_steps)
File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/deepspeed.py", line 362, in deepspeed_init
hf_deepspeed_config.trainer_config_finalize(args, model, num_training_steps)
File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/deepspeed.py", line 232, in trainer_config_finalize
raise ValueError(
ValueError: Please correct the following DeepSpeed config values that mismatch TrainingArguments values:
- ds train_micro_batch_size_per_gpu=1 vs hf per_device_train_batch_size=8
The easiest method is to set these DeepSpeed config values to 'auto'.
autotuning_results/profile_model_info/stdout.log:
[2023-12-06 03:21:01,892] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-06 03:21:02,866] [WARNING] [runner.py:203:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-12-06 03:21:02,867] [INFO] [runner.py:570:main] cmd = /usr/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None ./script/run_classification.py --model_name_or_path ckip-joint/bloom-1b1-zh --do_train --do_eval --output_dir ./bloom --train_file ./data/train.csv --validation_file ./data/test.csv --text_column_names sentence --label_column_name label --overwrite_output_dir --fp16 --torch_compile --deepspeed eyJ0cmFpbl9taWNyb19iYXRjaF9zaXplX3Blcl9ncHUiOiAxLCAiYXV0b3R1bmluZyI6IHsiZW5hYmxlZCI6IHRydWUsICJtb2RlbF9pbmZvX3BhdGgiOiAiYXV0b3R1bmluZ19yZXN1bHRzL3Byb2ZpbGVfbW9kZWxfaW5mby9tb2RlbF9pbmZvLmpzb24iLCAibW9kZWxfaW5mbyI6IHsicHJvZmlsZSI6IHRydWV9LCAibWV0cmljX3BhdGgiOiAiYXV0b3R1bmluZ19yZXN1bHRzL3Byb2ZpbGVfbW9kZWxfaW5mby9tZXRyaWNzLmpzb24ifSwgInplcm9fb3B0aW1pemF0aW9uIjogeyJzdGFnZSI6IDN9LCAibWVtb3J5X2JyZWFrX2Rvd24iOiBmYWxzZX0=
[2023-12-06 03:21:03,871] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-06 03:21:04,844] [INFO] [launch.py:138:main] 0 NCCL_VERSION=2.19.3
[2023-12-06 03:21:04,845] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0]}
[2023-12-06 03:21:04,845] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-12-06 03:21:04,845] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-12-06 03:21:04,845] [INFO] [launch.py:163:main] dist_world_size=1
[2023-12-06 03:21:04,845] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0
[2023-12-06 03:21:06,863] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-06 03:21:06,987] [INFO] [comm.py:637:init_distributed] cdb=None
[2023-12-06 03:21:06,987] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
12/06/2023 03:21:07 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, 16-bits training: True
12/06/2023 03:21:07 - INFO - __main__ - Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=eyJ0cmFpbl9taWNyb19iYXRjaF9zaXplX3Blcl9ncHUiOiAxLCAiYXV0b3R1bmluZyI6IHsiZW5hYmxlZCI6IHRydWUsICJtb2RlbF9pbmZvX3BhdGgiOiAiYXV0b3R1bmluZ19yZXN1bHRzL3Byb2ZpbGVfbW9kZWxfaW5mby9tb2RlbF9pbmZvLmpzb24iLCAibW9kZWxfaW5mbyI6IHsicHJvZmlsZSI6IHRydWV9LCAibWV0cmljX3BhdGgiOiAiYXV0b3R1bmluZ19yZXN1bHRzL3Byb2ZpbGVfbW9kZWxfaW5mby9tZXRyaWNzLmpzb24ifSwgInplcm9fb3B0aW1pemF0aW9uIjogeyJzdGFnZSI6IDN9LCAibWVtb3J5X2JyZWFrX2Rvd24iOiBmYWxzZX0=,
disable_tqdm=False,
dispatch_batches=None,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=True,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
gradient_checkpointing_kwargs=None,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_num_input_tokens_seen=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=./bloom/runs/Dec06_03-21-06_b253663f8948,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=500,
logging_strategy=steps,
lr_scheduler_kwargs={},
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=3.0,
optim=adamw_torch,
optim_args=None,
output_dir=./bloom,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=8,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=./bloom,
save_on_each_node=False,
save_only_model=False,
save_safetensors=True,
save_steps=500,
save_strategy=steps,
save_total_limit=None,
seed=42,
skip_memory_metrics=True,
split_batches=False,
tf32=None,
torch_compile=True,
torch_compile_backend=inductor,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
)
12/06/2023 03:21:07 - INFO - __main__ - load a local file for train: ./data/train.csv
12/06/2023 03:21:07 - INFO - __main__ - load a local file for validation: ./data/test.csv
12/06/2023 03:21:07 - INFO - datasets.builder - Using custom data configuration default-8f347103001581ec
12/06/2023 03:21:07 - INFO - datasets.info - Loading Dataset Infos from /usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/csv
12/06/2023 03:21:07 - INFO - datasets.builder - Generating dataset csv (/home/hf/.cache/huggingface/datasets/csv/default-8f347103001581ec/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d)
12/06/2023 03:21:07 - INFO - datasets.builder - Downloading and preparing dataset csv/default to /home/hf/.cache/huggingface/datasets/csv/default-8f347103001581ec/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d...
12/06/2023 03:21:07 - INFO - datasets.download.download_manager - Downloading took 0.0 min
12/06/2023 03:21:07 - INFO - datasets.download.download_manager - Checksum Computation took 0.0 min
12/06/2023 03:21:07 - INFO - datasets.builder - Generating train split
12/06/2023 03:21:07 - INFO - datasets.builder - Generating validation split
12/06/2023 03:21:07 - INFO - datasets.utils.info_utils - Unable to verify splits sizes.
12/06/2023 03:21:07 - INFO - datasets.builder - Dataset csv downloaded and prepared to /home/hf/.cache/huggingface/datasets/csv/default-8f347103001581ec/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d. Subsequent calls will reuse this data.
12/06/2023 03:21:08 - INFO - __main__ - setting problem type to single label classification
[2023-12-06 03:25:01,464] [INFO] [partition_parameters.py:348:__exit__] finished initializing model - num_params = 294, num_elems = 1.07B
12/06/2023 03:25:02 - WARNING - __main__ - The label2id key in the model config.json is not equal to the label2id key of this run. You can ignore this if you are doing finetuning.
12/06/2023 03:25:02 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/hf/.cache/huggingface/datasets/csv/default-8f347103001581ec/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d/cache-3d872eada15ea9fd.arrow
12/06/2023 03:25:02 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/hf/.cache/huggingface/datasets/csv/default-8f347103001581ec/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d/cache-e94ca5703f777eb9.arrow
12/06/2023 03:25:02 - INFO - __main__ - Sample 912 of the training set: {'user': 'Small Chen', 'sentence': 'ζ―εοΌζ»Ώζ»Ώη', 'label': 0, 'input_ids': [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 68211, 4077, 17111, 17111, 373], 'attention_mask': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1]}.
12/06/2023 03:25:02 - INFO - __main__ - Sample 204 of the training set: {'user': 'ε±ε', 'sentence': 'ε€εδΊΊηζηΆ²', 'label': 0, 'input_ids': [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 67003, 6872, 125486, 8211], 'attention_mask': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1]}.
12/06/2023 03:25:02 - INFO - __main__ - Sample 2253 of the training set: {'user': 'Mmmm', 'sentence': 'δ»δΈζ―δ»ιΊΌθ―η’©ε·₯η¨εΈ«', 'label': 0, 'input_ids': [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 205797, 17212, 7007, 81753, 126320], 'attention_mask': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1]}.
12/06/2023 03:25:04 - INFO - __main__ - Using accuracy as classification score, you can use --metric_name to overwrite.
[2023-12-06 03:25:05,096] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1374
[2023-12-06 03:25:05,097] [ERROR] [launch.py:321:sigkill_handler] ['/usr/bin/python3', '-u', './script/run_classification.py', '--local_rank=0', '--model_name_or_path', 'ckip-joint/bloom-1b1-zh', '--do_train', '--do_eval', '--output_dir', './bloom', '--train_file', './data/train.csv', '--validation_file', './data/test.csv', '--text_column_names', 'sentence', '--label_column_name', 'label', '--overwrite_output_dir', '--fp16', '--torch_compile', '--deepspeed', 'eyJ0cmFpbl9taWNyb19iYXRjaF9zaXplX3Blcl9ncHUiOiAxLCAiYXV0b3R1bmluZyI6IHsiZW5hYmxlZCI6IHRydWUsICJtb2RlbF9pbmZvX3BhdGgiOiAiYXV0b3R1bmluZ19yZXN1bHRzL3Byb2ZpbGVfbW9kZWxfaW5mby9tb2RlbF9pbmZvLmpzb24iLCAibW9kZWxfaW5mbyI6IHsicHJvZmlsZSI6IHRydWV9LCAibWV0cmljX3BhdGgiOiAiYXV0b3R1bmluZ19yZXN1bHRzL3Byb2ZpbGVfbW9kZWxfaW5mby9tZXRyaWNzLmpzb24ifSwgInplcm9fb3B0aW1pemF0aW9uIjogeyJzdGFnZSI6IDN9LCAibWVtb3J5X2JyZWFrX2Rvd24iOiBmYWxzZX0='] exits with return code = 1
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
gently pinging @muellerzr as you self assigned this!
Gentle ping @muellerzr
same problem. Can someone explain why the "No optimal configure" message appears? Could you also briefly explain the principle behind autotuning?
Same problem, I've tried many autotuning examples, however, none of those worked out ...
Another ping @muellerzr