Qwen
Qwen copied to clipboard
[BUG] <title>单机8卡A100进行Qwen-72B-chat-Int4 QLora训练时 出现OOM报错
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
- [X] 我已经搜索过FAQ | I have searched FAQ
当前行为 | Current Behavior
利用Qwen-72B-chat-Int4进行qlora微调的时候,目前是8张A100,model_max_length调成8192的话就会报OOM,下面是我训练的脚本代码,请问下是否有什么解决方案,我的训练样本长度没有太长
export CUDA_DEVICE_MAX_CONNECTIONS=1
DIR=pwd
GPUS_PER_NODE=$(python -c 'import torch; print(torch.cuda.device_count())')
NNODES=${NNODES:-1}
NODE_RANK=${NODE_RANK:-0}
MASTER_ADDR=${MASTER_ADDR:-localhost}
MASTER_PORT=${MASTER_PORT:-6001}
MODEL="Qwen__Qwen-72B-Chat-Int4" DATA="sft_train.json"
function usage() { echo ' Usage: bash finetune/finetune_qlora_ds.sh [-m MODEL_PATH] [-d DATA_PATH] ' }
while [[ "$1" != "" ]]; do case $1 in -m | --model ) shift MODEL=$1 ;; -d | --data ) shift DATA=$1 ;; -h | --help ) usage exit 0 ;; * ) echo "Unknown argument ${1}" exit 1 ;; esac shift done
DISTRIBUTED_ARGS="
--nproc_per_node $GPUS_PER_NODE
--nnodes $NNODES
--node_rank $NODE_RANK
--master_addr $MASTER_ADDR
--master_port $MASTER_PORT
"
Remember to use --fp16 instead of --bf16 due to autogptq
torchrun $DISTRIBUTED_ARGS finetune.py
--model_name_or_path $MODEL
--data_path $DATA
--fp16 True
--output_dir /home/qs/output
--num_train_epochs 100
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 8
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 1000000
--save_total_limit 10
--learning_rate 3e-4
--weight_decay 0.1
--adam_beta2 0.95
--warmup_ratio 0.01
--lr_scheduler_type "cosine"
--logging_steps 1
--report_to "tensorboard"
--model_max_length 8192
--lazy_preprocess True
--use_lora
--q_lora
--gradient_checkpointing
--deepspeed finetune/ds_config_zero2.json
这是报错内容: File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 157, in backward torch.autograd.backward(outputs_with_grad, args_with_grad) File "/root/miniconda3/lib/python3.10/site-packages/torch/autograd/init.py", line 200, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB (GPU 2; 79.35 GiB total capacity; 69.52 GiB already allocated; 7.83 GiB free; 69.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
期望行为 | Expected Behavior
期望可以不报oom
复现方法 | Steps To Reproduce
sh finetune_lora_ds.sh
运行环境 | Environment
- OS:
- Python: 3.10
- Transformers: 4.32.0
- PyTorch: 2.0.1
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):11.7
备注 | Anything else?
No response
@KevinFan0 @JustinLin610 @JianxinMa I am getting the same error when using a single machine with 8 X V100, 32 GB each even with batch size = 1. There are no other processes running apart from this one. Have you managed to solve this somehow? Many thanks.
Here is the error log (see below):
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla V100S-PCI... Off | 00000000:00:0D.0 Off | 0 | | N/A 34C P0 25W / 250W | 0MiB / 32510MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Tesla V100S-PCI... Off | 00000000:00:0E.0 Off | 0 | | N/A 32C P0 24W / 250W | 0MiB / 32510MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 Tesla V100S-PCI... Off | 00000000:00:0F.0 Off | 0 | | N/A 33C P0 25W / 250W | 0MiB / 32510MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 Tesla V100S-PCI... Off | 00000000:00:10.0 Off | 0 | | N/A 34C P0 25W / 250W | 0MiB / 32510MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 4 Tesla V100S-PCI... Off | 00000000:00:11.0 Off | 0 | | N/A 33C P0 25W / 250W | 0MiB / 32510MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 5 Tesla V100S-PCI... Off | 00000000:00:12.0 Off | 0 | | N/A 34C P0 25W / 250W | 0MiB / 32510MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 6 Tesla V100S-PCI... Off | 00000000:00:13.0 Off | 0 | | N/A 35C P0 25W / 250W | 0MiB / 32510MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 7 Tesla V100S-PCI... Off | 00000000:00:14.0 Off | 0 | | N/A 33C P0 25W / 250W | 0MiB / 32510MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_May__3_18:49:52_PDT_2022 Cuda compilation tools, release 11.7, V11.7.64 Build cuda_11.7.r11.7/compiler.31294372_0 WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2024-02-06 00:36:46,618] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-02-06 00:36:46,619] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-02-06 00:36:46,624] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-02-06 00:36:46,647] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-02-06 00:36:46,648] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-02-06 00:36:46,705] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-02-06 00:36:46,719] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-02-06 00:36:46,752] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2024-02-06 00:36:49,982] [INFO] [comm.py:637:init_distributed] cdb=None
/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2024-02-06 00:36:50,025] [INFO] [comm.py:637:init_distributed] cdb=None
/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2024-02-06 00:36:50,185] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-02-06 00:36:50,191] [INFO] [comm.py:637:init_distributed] cdb=None
/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2024-02-06 00:36:50,265] [INFO] [comm.py:637:init_distributed] cdb=None
/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2024-02-06 00:36:50,280] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-02-06 00:36:50,288] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-02-06 00:36:50,288] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2024-02-06 00:36:50,402] [INFO] [comm.py:637:init_distributed] cdb=None
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
Using disable_exllama
is deprecated and will be removed in version 4.37. Use use_exllama
instead and specify the version with exllama_config
.The value of use_exllama
will be overwritten by disable_exllama
passed in GPTQConfig
or stored in your config file.
Using disable_exllama
is deprecated and will be removed in version 4.37. Use use_exllama
instead and specify the version with exllama_config
.The value of use_exllama
will be overwritten by disable_exllama
passed in GPTQConfig
or stored in your config file.
Using disable_exllama
is deprecated and will be removed in version 4.37. Use use_exllama
instead and specify the version with exllama_config
.The value of use_exllama
will be overwritten by disable_exllama
passed in GPTQConfig
or stored in your config file.
Using disable_exllama
is deprecated and will be removed in version 4.37. Use use_exllama
instead and specify the version with exllama_config
.The value of use_exllama
will be overwritten by disable_exllama
passed in GPTQConfig
or stored in your config file.
Using disable_exllama
is deprecated and will be removed in version 4.37. Use use_exllama
instead and specify the version with exllama_config
.The value of use_exllama
will be overwritten by disable_exllama
passed in GPTQConfig
or stored in your config file.
Using disable_exllama
is deprecated and will be removed in version 4.37. Use use_exllama
instead and specify the version with exllama_config
.The value of use_exllama
will be overwritten by disable_exllama
passed in GPTQConfig
or stored in your config file.
Using disable_exllama
is deprecated and will be removed in version 4.37. Use use_exllama
instead and specify the version with exllama_config
.The value of use_exllama
will be overwritten by disable_exllama
passed in GPTQConfig
or stored in your config file.
Using disable_exllama
is deprecated and will be removed in version 4.37. Use use_exllama
instead and specify the version with exllama_config
.The value of use_exllama
will be overwritten by disable_exllama
passed in GPTQConfig
or stored in your config file.
Try importing flash-attention for faster inference...
Try importing flash-attention for faster inference...
Try importing flash-attention for faster inference...
Try importing flash-attention for faster inference...
Try importing flash-attention for faster inference...
Try importing flash-attention for faster inference...
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Try importing flash-attention for faster inference...
Try importing flash-attention for faster inference...
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 0%| | 0/21 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/21 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/21 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/21 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/21 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/21 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/21 [00:00<?, ?it/s]
Loading checkpoint shards: 5%|▍ | 1/21 [00:00<00:04, 4.18it/s]
Loading checkpoint shards: 5%|▍ | 1/21 [00:00<00:03, 5.47it/s]
Loading checkpoint shards: 5%|▍ | 1/21 [00:00<00:03, 5.56it/s]
Loading checkpoint shards: 5%|▍ | 1/21 [00:00<00:03, 5.36it/s]
Loading checkpoint shards: 5%|▍ | 1/21 [00:00<00:03, 5.41it/s]
Loading checkpoint shards: 5%|▍ | 1/21 [00:00<00:03, 5.50it/s]
Loading checkpoint shards: 5%|▍ | 1/21 [00:00<00:03, 5.44it/s]
Loading checkpoint shards: 0%| | 0/21 [00:00<?, ?it/s]
Loading checkpoint shards: 5%|▍ | 1/21 [00:00<00:04, 4.94it/s]
Loading checkpoint shards: 10%|▉ | 2/21 [00:02<00:22, 1.16s/it]
Loading checkpoint shards: 10%|▉ | 2/21 [00:02<00:22, 1.19s/it]
Loading checkpoint shards: 10%|▉ | 2/21 [00:02<00:21, 1.16s/it]
Loading checkpoint shards: 10%|▉ | 2/21 [00:02<00:24, 1.29s/it]
Loading checkpoint shards: 10%|▉ | 2/21 [00:02<00:24, 1.27s/it]
Loading checkpoint shards: 10%|▉ | 2/21 [00:02<00:24, 1.26s/it]
Loading checkpoint shards: 10%|▉ | 2/21 [00:02<00:22, 1.21s/it]
Loading checkpoint shards: 10%|▉ | 2/21 [00:01<00:20, 1.05s/it]
Loading checkpoint shards: 14%|█▍ | 3/21 [00:04<00:35, 1.96s/it]
Loading checkpoint shards: 14%|█▍ | 3/21 [00:04<00:34, 1.91s/it]
Loading checkpoint shards: 14%|█▍ | 3/21 [00:04<00:34, 1.93s/it]
Loading checkpoint shards: 14%|█▍ | 3/21 [00:05<00:35, 1.98s/it]
Loading checkpoint shards: 14%|█▍ | 3/21 [00:04<00:33, 1.84s/it]
Loading checkpoint shards: 14%|█▍ | 3/21 [00:04<00:34, 1.93s/it]
Loading checkpoint shards: 14%|█▍ | 3/21 [00:05<00:35, 1.97s/it]
Loading checkpoint shards: 14%|█▍ | 3/21 [00:04<00:34, 1.92s/it]
Loading checkpoint shards: 19%|█▉ | 4/21 [00:07<00:34, 2.01s/it]
Loading checkpoint shards: 19%|█▉ | 4/21 [00:07<00:34, 2.02s/it]
Loading checkpoint shards: 19%|█▉ | 4/21 [00:06<00:33, 1.99s/it]
Loading checkpoint shards: 19%|█▉ | 4/21 [00:06<00:33, 1.99s/it]
Loading checkpoint shards: 19%|█▉ | 4/21 [00:06<00:33, 1.94s/it]
Loading checkpoint shards: 19%|█▉ | 4/21 [00:06<00:33, 1.99s/it]
Loading checkpoint shards: 19%|█▉ | 4/21 [00:07<00:34, 2.00s/it]
Loading checkpoint shards: 19%|█▉ | 4/21 [00:07<00:34, 2.03s/it]
Loading checkpoint shards: 24%|██▍ | 5/21 [00:09<00:37, 2.32s/it]
Loading checkpoint shards: 24%|██▍ | 5/21 [00:09<00:36, 2.31s/it]
Loading checkpoint shards: 24%|██▍ | 5/21 [00:09<00:37, 2.31s/it]
Loading checkpoint shards: 24%|██▍ | 5/21 [00:10<00:37, 2.33s/it]
Loading checkpoint shards: 24%|██▍ | 5/21 [00:09<00:37, 2.33s/it]
Loading checkpoint shards: 24%|██▍ | 5/21 [00:09<00:37, 2.32s/it]
Loading checkpoint shards: 24%|██▍ | 5/21 [00:09<00:37, 2.31s/it]
Loading checkpoint shards: 24%|██▍ | 5/21 [00:09<00:36, 2.29s/it]
Loading checkpoint shards: 29%|██▊ | 6/21 [00:12<00:38, 2.54s/it]
Loading checkpoint shards: 29%|██▊ | 6/21 [00:12<00:38, 2.54s/it]
Loading checkpoint shards: 29%|██▊ | 6/21 [00:12<00:38, 2.54s/it]
Loading checkpoint shards: 29%|██▊ | 6/21 [00:13<00:38, 2.56s/it]
Loading checkpoint shards: 29%|██▊ | 6/21 [00:12<00:38, 2.54s/it]
Loading checkpoint shards: 29%|██▊ | 6/21 [00:12<00:37, 2.52s/it]
Loading checkpoint shards: 29%|██▊ | 6/21 [00:12<00:38, 2.56s/it]
Loading checkpoint shards: 29%|██▊ | 6/21 [00:12<00:38, 2.56s/it]
Loading checkpoint shards: 33%|███▎ | 7/21 [00:15<00:34, 2.45s/it]
Loading checkpoint shards: 33%|███▎ | 7/21 [00:15<00:34, 2.44s/it]
Loading checkpoint shards: 33%|███▎ | 7/21 [00:15<00:34, 2.44s/it]
Loading checkpoint shards: 33%|███▎ | 7/21 [00:15<00:34, 2.44s/it]
Loading checkpoint shards: 33%|███▎ | 7/21 [00:15<00:34, 2.45s/it]
Loading checkpoint shards: 33%|███▎ | 7/21 [00:15<00:34, 2.44s/it]
Loading checkpoint shards: 33%|███▎ | 7/21 [00:14<00:33, 2.43s/it]
Loading checkpoint shards: 33%|███▎ | 7/21 [00:15<00:34, 2.45s/it]
Loading checkpoint shards: 38%|███▊ | 8/21 [00:17<00:32, 2.54s/it]
Loading checkpoint shards: 38%|███▊ | 8/21 [00:17<00:33, 2.54s/it]
Loading checkpoint shards: 38%|███▊ | 8/21 [00:17<00:32, 2.54s/it]
Loading checkpoint shards: 38%|███▊ | 8/21 [00:18<00:33, 2.54s/it]
Loading checkpoint shards: 38%|███▊ | 8/21 [00:17<00:32, 2.54s/it]
Loading checkpoint shards: 38%|███▊ | 8/21 [00:17<00:32, 2.54s/it]
Loading checkpoint shards: 38%|███▊ | 8/21 [00:17<00:33, 2.54s/it]
Loading checkpoint shards: 38%|███▊ | 8/21 [00:17<00:32, 2.53s/it]
Loading checkpoint shards: 43%|████▎ | 9/21 [00:20<00:29, 2.43s/it]
Loading checkpoint shards: 43%|████▎ | 9/21 [00:19<00:29, 2.43s/it]
Loading checkpoint shards: 43%|████▎ | 9/21 [00:20<00:29, 2.43s/it]
Loading checkpoint shards: 43%|████▎ | 9/21 [00:20<00:29, 2.43s/it]
Loading checkpoint shards: 43%|████▎ | 9/21 [00:20<00:29, 2.43s/it]
Loading checkpoint shards: 43%|████▎ | 9/21 [00:19<00:29, 2.43s/it]
Loading checkpoint shards: 43%|████▎ | 9/21 [00:20<00:29, 2.43s/it]
Loading checkpoint shards: 43%|████▎ | 9/21 [00:19<00:29, 2.42s/it]
Loading checkpoint shards: 48%|████▊ | 10/21 [00:23<00:28, 2.62s/it]
Loading checkpoint shards: 48%|████▊ | 10/21 [00:23<00:28, 2.62s/it]
Loading checkpoint shards: 48%|████▊ | 10/21 [00:23<00:28, 2.62s/it]
Loading checkpoint shards: 48%|████▊ | 10/21 [00:23<00:28, 2.62s/it]
Loading checkpoint shards: 48%|████▊ | 10/21 [00:23<00:28, 2.62s/it]
Loading checkpoint shards: 48%|████▊ | 10/21 [00:23<00:28, 2.62s/it]
Loading checkpoint shards: 48%|████▊ | 10/21 [00:23<00:28, 2.62s/it]
Loading checkpoint shards: 48%|████▊ | 10/21 [00:22<00:28, 2.61s/it]
Loading checkpoint shards: 52%|█████▏ | 11/21 [00:25<00:24, 2.42s/it]
Loading checkpoint shards: 52%|█████▏ | 11/21 [00:25<00:24, 2.43s/it]
Loading checkpoint shards: 52%|█████▏ | 11/21 [00:25<00:24, 2.42s/it]
Loading checkpoint shards: 52%|█████▏ | 11/21 [00:25<00:24, 2.42s/it]
Loading checkpoint shards: 52%|█████▏ | 11/21 [00:25<00:24, 2.43s/it]
Loading checkpoint shards: 52%|█████▏ | 11/21 [00:25<00:24, 2.43s/it]
Loading checkpoint shards: 52%|█████▏ | 11/21 [00:24<00:24, 2.42s/it]
Loading checkpoint shards: 52%|█████▏ | 11/21 [00:24<00:24, 2.42s/it]
Loading checkpoint shards: 57%|█████▋ | 12/21 [00:27<00:23, 2.58s/it]
Loading checkpoint shards: 57%|█████▋ | 12/21 [00:28<00:23, 2.58s/it]
Loading checkpoint shards: 57%|█████▋ | 12/21 [00:28<00:23, 2.58s/it]
Loading checkpoint shards: 57%|█████▋ | 12/21 [00:28<00:23, 2.58s/it]
Loading checkpoint shards: 57%|█████▋ | 12/21 [00:27<00:23, 2.58s/it]
Loading checkpoint shards: 57%|█████▋ | 12/21 [00:27<00:23, 2.58s/it]
Loading checkpoint shards: 57%|█████▋ | 12/21 [00:27<00:23, 2.58s/it]
Loading checkpoint shards: 57%|█████▋ | 12/21 [00:28<00:23, 2.60s/it]
Loading checkpoint shards: 62%|██████▏ | 13/21 [00:31<00:21, 2.66s/it]
Loading checkpoint shards: 62%|██████▏ | 13/21 [00:30<00:21, 2.66s/it]
Loading checkpoint shards: 62%|██████▏ | 13/21 [00:30<00:21, 2.67s/it]
Loading checkpoint shards: 62%|██████▏ | 13/21 [00:30<00:21, 2.67s/it]
Loading checkpoint shards: 62%|██████▏ | 13/21 [00:30<00:21, 2.67s/it]
Loading checkpoint shards: 62%|██████▏ | 13/21 [00:30<00:21, 2.67s/it]
Loading checkpoint shards: 62%|██████▏ | 13/21 [00:30<00:21, 2.66s/it]
Loading checkpoint shards: 62%|██████▏ | 13/21 [00:30<00:21, 2.68s/it]
Loading checkpoint shards: 67%|██████▋ | 14/21 [00:33<00:17, 2.55s/it]
Loading checkpoint shards: 67%|██████▋ | 14/21 [00:33<00:17, 2.56s/it]
Loading checkpoint shards: 67%|██████▋ | 14/21 [00:33<00:17, 2.56s/it]
Loading checkpoint shards: 67%|██████▋ | 14/21 [00:33<00:17, 2.55s/it]
Loading checkpoint shards: 67%|██████▋ | 14/21 [00:33<00:17, 2.56s/it]
Loading checkpoint shards: 67%|██████▋ | 14/21 [00:33<00:17, 2.56s/it]
Loading checkpoint shards: 67%|██████▋ | 14/21 [00:32<00:17, 2.56s/it]
Loading checkpoint shards: 67%|██████▋ | 14/21 [00:33<00:17, 2.56s/it]
Loading checkpoint shards: 71%|███████▏ | 15/21 [00:35<00:15, 2.60s/it]
Loading checkpoint shards: 71%|███████▏ | 15/21 [00:35<00:15, 2.60s/it]
Loading checkpoint shards: 71%|███████▏ | 15/21 [00:35<00:15, 2.60s/it]
Loading checkpoint shards: 71%|███████▏ | 15/21 [00:35<00:15, 2.60s/it]
Loading checkpoint shards: 71%|███████▏ | 15/21 [00:35<00:15, 2.60s/it]
Loading checkpoint shards: 71%|███████▏ | 15/21 [00:36<00:15, 2.60s/it]
Loading checkpoint shards: 71%|███████▏ | 15/21 [00:35<00:15, 2.60s/it]
Loading checkpoint shards: 71%|███████▏ | 15/21 [00:35<00:15, 2.60s/it]
Loading checkpoint shards: 76%|███████▌ | 16/21 [00:38<00:12, 2.50s/it]
Loading checkpoint shards: 76%|███████▌ | 16/21 [00:38<00:12, 2.50s/it]
Loading checkpoint shards: 76%|███████▌ | 16/21 [00:38<00:12, 2.50s/it]
Loading checkpoint shards: 76%|███████▌ | 16/21 [00:38<00:12, 2.50s/it]
Loading checkpoint shards: 76%|███████▌ | 16/21 [00:38<00:12, 2.50s/it]
Loading checkpoint shards: 76%|███████▌ | 16/21 [00:38<00:12, 2.50s/it]
Loading checkpoint shards: 76%|███████▌ | 16/21 [00:38<00:12, 2.50s/it]
Loading checkpoint shards: 76%|███████▌ | 16/21 [00:37<00:12, 2.50s/it]
Loading checkpoint shards: 81%|████████ | 17/21 [00:41<00:10, 2.63s/it]
Loading checkpoint shards: 81%|████████ | 17/21 [00:41<00:10, 2.63s/it]
Loading checkpoint shards: 81%|████████ | 17/21 [00:41<00:10, 2.63s/it]
Loading checkpoint shards: 81%|████████ | 17/21 [00:41<00:10, 2.63s/it]
Loading checkpoint shards: 81%|████████ | 17/21 [00:41<00:10, 2.63s/it]
Loading checkpoint shards: 81%|████████ | 17/21 [00:41<00:10, 2.63s/it]
Loading checkpoint shards: 81%|████████ | 17/21 [00:41<00:10, 2.63s/it]
Loading checkpoint shards: 81%|████████ | 17/21 [00:40<00:10, 2.63s/it]
Loading checkpoint shards: 86%|████████▌ | 18/21 [00:43<00:08, 2.68s/it]
Loading checkpoint shards: 86%|████████▌ | 18/21 [00:43<00:08, 2.68s/it]
Loading checkpoint shards: 86%|████████▌ | 18/21 [00:43<00:08, 2.67s/it]
Loading checkpoint shards: 86%|████████▌ | 18/21 [00:43<00:08, 2.68s/it]
Loading checkpoint shards: 86%|████████▌ | 18/21 [00:43<00:08, 2.68s/it]
Loading checkpoint shards: 86%|████████▌ | 18/21 [00:43<00:08, 2.67s/it]
Loading checkpoint shards: 86%|████████▌ | 18/21 [00:44<00:08, 2.68s/it]
Loading checkpoint shards: 86%|████████▌ | 18/21 [00:43<00:08, 2.68s/it]
Loading checkpoint shards: 90%|█████████ | 19/21 [00:46<00:05, 2.56s/it]
Loading checkpoint shards: 90%|█████████ | 19/21 [00:46<00:05, 2.56s/it]
Loading checkpoint shards: 90%|█████████ | 19/21 [00:46<00:05, 2.56s/it]
Loading checkpoint shards: 90%|█████████ | 19/21 [00:46<00:05, 2.56s/it]
Loading checkpoint shards: 90%|█████████ | 19/21 [00:46<00:05, 2.56s/it]
Loading checkpoint shards: 90%|█████████ | 19/21 [00:46<00:05, 2.56s/it]
Loading checkpoint shards: 90%|█████████ | 19/21 [00:46<00:05, 2.56s/it]
Loading checkpoint shards: 90%|█████████ | 19/21 [00:45<00:05, 2.57s/it]
Loading checkpoint shards: 95%|█████████▌| 20/21 [00:47<00:02, 2.37s/it]
Loading checkpoint shards: 95%|█████████▌| 20/21 [00:48<00:02, 2.37s/it]
Loading checkpoint shards: 95%|█████████▌| 20/21 [00:48<00:02, 2.37s/it]
Loading checkpoint shards: 95%|█████████▌| 20/21 [00:48<00:02, 2.37s/it]
Loading checkpoint shards: 95%|█████████▌| 20/21 [00:48<00:02, 2.37s/it]
Loading checkpoint shards: 95%|█████████▌| 20/21 [00:48<00:02, 2.37s/it]
Loading checkpoint shards: 95%|█████████▌| 20/21 [00:48<00:02, 2.37s/it]
Loading checkpoint shards: 95%|█████████▌| 20/21 [00:48<00:02, 2.37s/it]
Loading checkpoint shards: 100%|██████████| 21/21 [00:47<00:00, 1.70s/it]
Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 1.71s/it]
Loading checkpoint shards: 100%|██████████| 21/21 [00:47<00:00, 2.28s/it]
Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 2.30s/it]
Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 1.71s/it]
Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 1.71s/it]
Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 1.71s/it]
Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 2.29s/it]
Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 2.30s/it]
Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 2.30s/it]
Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 1.71s/it]
Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 1.71s/it]
Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 2.29s/it]
Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 1.71s/it]
Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 2.30s/it]
Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 2.30s/it]
trainable params: 754,974,720 || all params: 3,247,710,208 || trainable%: 23.246369646537133
Loading data...
trainable params: 754,974,720 || all params: 3,247,710,208 || trainable%: 23.246369646537133
trainable params: 754,974,720 || all params: 3,247,710,208 || trainable%: 23.246369646537133
trainable params: 754,974,720 || all params: 3,247,710,208 || trainable%: 23.246369646537133
trainable params: 754,974,720 || all params: 3,247,710,208 || trainable%: 23.246369646537133
trainable params: 754,974,720 || all params: 3,247,710,208 || trainable%: 23.246369646537133
trainable params: 754,974,720 || all params: 3,247,710,208 || trainable%: 23.246369646537133
trainable params: 754,974,720 || all params: 3,247,710,208 || trainable%: 23.246369646537133
Formatting inputs...Skip in lazy mode
Detected kernel version 4.18.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs
in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing
in your model.
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs
in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing
in your model.
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs
in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing
in your model.
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs
in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing
in your model.
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs
in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing
in your model.
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs
in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing
in your model.
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs
in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing
in your model.
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs
in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing
in your model.
Traceback (most recent call last):
File "/home/mentox/project/qwen_72b_int4/finetune.py", line 374, in
File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop( ^^^^^^^^^^^^^^^Traceback (most recent call last): ^^^^^ train() File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1690, in _inner_training_loop
File "/home/mentox/project/qwen_72b_int4/finetune.py", line 374, in
^^ ^^^train()^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mentox/project/qwen_72b_int4/finetune.py", line 367, in train File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1690, in _inner_training_loop File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1690, in _inner_training_loop model, self.optimizer, self.lr_scheduler = self.accelerator.prepare( train() File "/home/mentox/project/qwen_72b_int4/finetune.py", line 367, in train ^^^^^^^^^^^^^^^^^^^^^^^ ^trainer.train()^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1219, in prepare File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1539, in train model, self.optimizer, self.lr_scheduler = self.accelerator.prepare( trainer.train() File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1539, in train ^^return inner_training_loop(^ ^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^ ^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1219, in prepare ^^^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1690, in _inner_training_loop model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(
result = self._prepare_deepspeed(*args)
^ ^^ ^^ ^^ ^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^
^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1219, in prepare
^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1219, in prepare
^^^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1604, in _prepare_deepspeed
return inner_training_loop(
^^^^^^^^^^^^ ^^result = self._prepare_deepspeed(*args)^
^^^^^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1690, in _inner_training_loop
return inner_training_loop(
^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(
^
^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1604, in _prepare_deepspeed
File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1690, in _inner_training_loop
result = self._prepare_deepspeed(*args)
result = self._prepare_deepspeed(*args)
^^^^^^^^^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs) ^^
^
^ ^ ^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1219, in prepare
^^^^^^^^^^^^^^^^^^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^
^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1604, in _prepare_deepspeed
File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1604, in _prepare_deepspeed
^^^^^^ ^model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(^
^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/init.py", line 171, in initialize
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
^^^^^^^^^^^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(
File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1219, in prepare
result = self._prepare_deepspeed(*args)
engine = DeepSpeedEngine(args=args,
^^ ^^ ^ ^ ^^ ^ ^ ^ ^ ^^^ ^^ ^^^ ^^^ ^^^ ^^ ^ ^ engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)^^
^^ ^ ^^^ ^engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)^
^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^ ^^^^ ^^^^^^
^^ ^^^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 263, in init
^ ^^ ^Traceback (most recent call last):
^^ ^
^^ File "/home/mentox/project/qwen_72b_int4/finetune.py", line 374, in
^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1604, in _prepare_deepspeed
File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1219, in prepare
^ ^ ^ ^ ^ ^ ^^ ^^ ^^ ^^ ^^^^^ ^^^^^^self._configure_distributed_model(model)^^
^^^^^^^^^^^^^^^^^^^^ ^^^^engine = DeepSpeedEngine(args=args,^^^^^
^^^^^^^^^^ ^^^ ^^^ ^^ ^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1103, in _configure_distributed_model
^
^^ ^^^ ^^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/init.py", line 171, in initialize
^ ^ ^ ^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/init.py", line 171, in initialize
^
^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1604, in _prepare_deepspeed
^^^^^^^^^^^^^^^^^^^^^^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 263, in init
engine = DeepSpeedEngine(args=args,
engine = DeepSpeedEngine(args=args,
train()
^ ^ ^^^^^^^^^^^^^^ ^^self._configure_distributed_model(model)^^^
^^^^^^^^^^^ File "/home/mentox/project/qwen_72b_int4/finetune.py", line 367, in train
^ ^^result = self._prepare_deepspeed(*args)^^
^^^^^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1103, in _configure_distributed_model
^^
^^^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 263, in init
File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 263, in init
self.module.to(self.device)
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
^^^^^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1145, in to
^^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1604, in _prepare_deepspeed
self._configure_distributed_model(model) self._configure_distributed_model(model)trainer.train()
File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1539, in train
File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1103, in _configure_distributed_model
^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1103, in _configure_distributed_model
^^^^^^^^^^^ ^engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)^^
^^^^^^^^^^^^^^^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/init.py", line 171, in initialize
self.module.to(self.device)
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1145, in to
^^^^^^^^^^^^^^^^^^^ ^engine = DeepSpeedEngine(args=args,^
^^^ ^ return self._apply(convert)^
^ ^ ^ ^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/init.py", line 171, in initialize
^ ^ ^ ^ ^^ ^ ^ ^ ^ ^^^^^^ ^^^self.module.to(self.device)^
^^self.module.to(self.device)^^
^^^^^^^^^^^^^^^
^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1145, in to
^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 263, in init
^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1145, in to
^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
engine = DeepSpeedEngine(args=args,
^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^return self._apply(convert) ^
^ ^ ^ ^ ^ return inner_training_loop( ^
^ ^ ^ self._configure_distributed_model(model) ^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 263, in __init__
^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1103, in _configure_distributed_model
^ ^ ^ ^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^module._apply(fn)^^^
^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1690, in _inner_training_loop
^^^^^^^^ ^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
return self._apply(convert)^^
^^ ^^^return self._apply(convert)
^
self._configure_distributed_model(model)
File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/init.py", line 171, in initialize
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1103, in _configure_distributed_model
^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
^^^^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
engine = DeepSpeedEngine(args=args,
self.module.to(self.device)
^^^^^^^^^^ ^module._apply(fn)^
^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1145, in to
^^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
^^^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 263, in init
module._apply(fn)
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
self.module.to(self.device)
module._apply(fn)
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1145, in to
self._configure_distributed_model(model)
File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1103, in _configure_distributed_model
model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(
module._apply(fn)
[Previous line repeated 5 more times]
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 844, in _apply
return self._apply(convert)
module._apply(fn)
^^^^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn) ^
^ ^ ^^ ^ ^ ^ ^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
^^^^^^^^ ^^module._apply(fn)^^
^^^^^^^^
^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1219, in prepare
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
^^^^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
return self._apply(convert)
self._buffers[key] = fn(buf)
self.module.to(self.device)
^ ^ ^ ^ ^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1145, in to
^ ^ ^ module._apply(fn)^
^ ^ ^ ^ ^ ^^^module._apply(fn)^^^
^^ [Previous line repeated 5 more times]
^^
^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 844, in _apply
^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1143, in convert
[Previous line repeated 5 more times]
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 844, in _apply
module._apply(fn)module._apply(fn)
[Previous line repeated 5 more times] File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 844, in _apply result = self._prepare_deepspeed(*args) module._apply(fn) self._buffers[key] = fn(buf)
self._buffers[key] = fn(buf)
^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
^ ^ return self._apply(convert) ^ ^
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)^
^ ^ ^ ^ ^ ^ module._apply(fn) ^ ^
^ ^ ^ ^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
self._buffers[key] = fn(buf)^
^ ^ ^ ^ ^ ^ ^^ ^
^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1604, in _prepare_deepspeed
^ ^^ ^^^ ^^^^^ ^^^^^ ^^^^^ ^ ^^^^^ ^^^^^ ^^
^ ^ ^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1143, in convert ^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1143, in convert ^^ ^^ module._apply(fn) ^ ^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply ^ ^ ^ ^ ^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply ^^^^^^^^^^^^^^ ^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1143, in convert ^^^^^^^^^^^^ ^module._apply(fn)^ ^^^^^^^^^^^ [Previous line repeated 5 more times] ^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 844, in _apply ^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 1; 31.75 GiB total capacity; 30.97 GiB already allocated; 61.50 MiB free; 30.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF module._apply(fn)module._apply(fn)
[Previous line repeated 5 more times]
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 844, in _apply
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) self._buffers[key] = fn(buf)engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
^^ ^^ ^^ ^^ ^^ ^^ ^ ^ ^^ ^^ ^ ^^ ^ ^ ^ ^ ^^ ^ ^ ^ ^ ^ ^^ ^^ ^^ ^^ ^^ ^^ ^ ^^^ ^ ^^ ^ ^^^ ^^ ^ ^^ ^^ module._apply(fn)^^ self._buffers[key] = fn(buf)^^
^^
^^^^ ^^ ^^^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
^^ ^^ ^^
^ ^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1143, in convert
^ ^^^ ^^ ^^ ^ ^ ^^ ^^^ ^^^ ^ ^ ^^ ^^^ ^^ ^^^ ^^ ^^^ ^^^ ^^^^^ ^^^^^^ ^^^^^ ^^^^ ^^^ ^^^^^^ ^^^^^ ^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1143, in convert
^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^ ^^^module._apply(fn) File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/init.py", line 171, in initialize
^^^
^^^^^^^^^^^^^^^^^^^^^^^ [Previous line repeated 5 more times]
^^^^^^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 844, in _apply
^^^^^
^^^^^^torch.cuda^torch.cuda.^^.OutOfMemoryError^: OutOfMemoryError^^: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 2; 31.75 GiB total capacity; 30.97 GiB already allocated; 61.50 MiB free; 30.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF^^
CUDA out of memory. Tried to allocate 96.00 MiB (GPU 4; 31.75 GiB total capacity; 30.97 GiB already allocated; 61.50 MiB free; 30.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF^
^return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)^
^^
torch.cuda.OutOfMemoryError : engine = DeepSpeedEngine(args=args,
CUDA out of memory. Tried to allocate 96.00 MiB (GPU 3; 31.75 GiB total capacity; 30.97 GiB already allocated; 61.50 MiB free; 30.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^ ^return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 263, in init
^^^^^^^^^ ^^self._buffers[key] = fn(buf)^ ^^^^^^^^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^self._configure_distributed_model(model) ^^ ^^^^^^^^^^^^^^^^^^^^^ ^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1143, in convert ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1103, in _configure_distributed_model ^^^^^^^torch.cuda^^.^OutOfMemoryError^^: ^CUDA out of memory. Tried to allocate 96.00 MiB (GPU 7; 31.75 GiB total capacity; 30.97 GiB already allocated; 61.50 MiB free; 30.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 0; 31.75 GiB total capacity; 30.97 GiB already allocated; 61.50 MiB free; 30.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF self.module.to(self.device)return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1145, in to
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 5; 31.75 GiB total capacity; 30.97 GiB already allocated; 61.50 MiB free; 30.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
[Previous line repeated 5 more times]
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 844, in _apply
self._buffers[key] = fn(buf)
^^^^^^^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 6; 31.75 GiB total capacity; 30.97 GiB already allocated; 61.50 MiB free; 30.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 81) of binary: /home/mentox/miniconda3/bin/python
Traceback (most recent call last):
File "/home/mentox/miniconda3/bin/torchrun", line 8, in
sys.exit(main())
^^^^^^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
/home/mentox/project/qwen_72b_int4/finetune.py FAILED
Failures: [1]: time : 2024-02-06_00:40:59 host : job-6edb80eb-6e5a-4cc7-82c2-29ad70ed2112 rank : 1 (local_rank: 1) exitcode : 1 (pid: 82) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [2]: time : 2024-02-06_00:40:59 host : job-6edb80eb-6e5a-4cc7-82c2-29ad70ed2112 rank : 2 (local_rank: 2) exitcode : 1 (pid: 83) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [3]: time : 2024-02-06_00:40:59 host : job-6edb80eb-6e5a-4cc7-82c2-29ad70ed2112 rank : 3 (local_rank: 3) exitcode : 1 (pid: 84) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [4]: time : 2024-02-06_00:40:59 host : job-6edb80eb-6e5a-4cc7-82c2-29ad70ed2112 rank : 4 (local_rank: 4) exitcode : 1 (pid: 85) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [5]: time : 2024-02-06_00:40:59 host : job-6edb80eb-6e5a-4cc7-82c2-29ad70ed2112 rank : 5 (local_rank: 5) exitcode : 1 (pid: 86) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [6]: time : 2024-02-06_00:40:59 host : job-6edb80eb-6e5a-4cc7-82c2-29ad70ed2112 rank : 6 (local_rank: 6) exitcode : 1 (pid: 87) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [7]: time : 2024-02-06_00:40:59 host : job-6edb80eb-6e5a-4cc7-82c2-29ad70ed2112 rank : 7 (local_rank: 7) exitcode : 1 (pid: 88) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Root Cause (first observed failure): [0]: time : 2024-02-06_00:40:59 host : job-6edb80eb-6e5a-4cc7-82c2-29ad70ed2112 rank : 0 (local_rank: 0) exitcode : 1 (pid: 81) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
调小 model_max_length 8192调到512,再一步步往上加
8卡A100都能全参finetune了呀
Adjust model_max_length down from 8192 to 512, and then increase it step by step.
@WangJianQ-cmd Thanks for your reply. Unfortunately I am still getting OOM even when I reduce model_max_length to values lower than 512 . The same applies for any other value larger than 512 . Thanks
All 8-card A100 can participate in finetune.
@Yuxiang1995 Thanks for getting back to me! I only have 8 V100 not A100 . Do you think it would be still possible?
All 8-card A100 can participate in finetune.
@Yuxiang1995 Thanks for getting back to me! I only have 8 V100 not A100 . Do you think it would be still possible?
Hi did you manage to solve. the problem? I encountered the same problem.
This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread. 此问题由于长期未有新进展而被系统自动标记为不活跃。如果您认为它仍有待解决,请在此帖下方留言以补充信息。