Qwen icon indicating copy to clipboard operation
Qwen copied to clipboard

[BUG] <title>单机8卡A100进行Qwen-72B-chat-Int4 QLora训练时 出现OOM报错

Open KevinFan0 opened this issue 1 year ago • 7 comments

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • [X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

利用Qwen-72B-chat-Int4进行qlora微调的时候,目前是8张A100,model_max_length调成8192的话就会报OOM,下面是我训练的脚本代码,请问下是否有什么解决方案,我的训练样本长度没有太长

export CUDA_DEVICE_MAX_CONNECTIONS=1

DIR=pwd

GPUS_PER_NODE=$(python -c 'import torch; print(torch.cuda.device_count())')

NNODES=${NNODES:-1}

NODE_RANK=${NODE_RANK:-0}

MASTER_ADDR=${MASTER_ADDR:-localhost}

MASTER_PORT=${MASTER_PORT:-6001}

MODEL="Qwen__Qwen-72B-Chat-Int4" DATA="sft_train.json"

function usage() { echo ' Usage: bash finetune/finetune_qlora_ds.sh [-m MODEL_PATH] [-d DATA_PATH] ' }

while [[ "$1" != "" ]]; do case $1 in -m | --model ) shift MODEL=$1 ;; -d | --data ) shift DATA=$1 ;; -h | --help ) usage exit 0 ;; * ) echo "Unknown argument ${1}" exit 1 ;; esac shift done

DISTRIBUTED_ARGS=" --nproc_per_node $GPUS_PER_NODE
--nnodes $NNODES
--node_rank $NODE_RANK
--master_addr $MASTER_ADDR
--master_port $MASTER_PORT "

Remember to use --fp16 instead of --bf16 due to autogptq

torchrun $DISTRIBUTED_ARGS finetune.py
--model_name_or_path $MODEL
--data_path $DATA
--fp16 True
--output_dir /home/qs/output
--num_train_epochs 100
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 8
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 1000000
--save_total_limit 10
--learning_rate 3e-4
--weight_decay 0.1
--adam_beta2 0.95
--warmup_ratio 0.01
--lr_scheduler_type "cosine"
--logging_steps 1
--report_to "tensorboard"
--model_max_length 8192
--lazy_preprocess True
--use_lora
--q_lora
--gradient_checkpointing
--deepspeed finetune/ds_config_zero2.json

这是报错内容: File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 157, in backward torch.autograd.backward(outputs_with_grad, args_with_grad) File "/root/miniconda3/lib/python3.10/site-packages/torch/autograd/init.py", line 200, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB (GPU 2; 79.35 GiB total capacity; 69.52 GiB already allocated; 7.83 GiB free; 69.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

期望行为 | Expected Behavior

期望可以不报oom

复现方法 | Steps To Reproduce

sh finetune_lora_ds.sh

运行环境 | Environment

- OS:
- Python: 3.10
- Transformers: 4.32.0
- PyTorch: 2.0.1
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):11.7

备注 | Anything else?

No response

KevinFan0 avatar Feb 02 '24 05:02 KevinFan0

@KevinFan0 @JustinLin610 @JianxinMa I am getting the same error when using a single machine with 8 X V100, 32 GB each even with batch size = 1. There are no other processes running apart from this one. Have you managed to solve this somehow? Many thanks.

Here is the error log (see below):

| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla V100S-PCI... Off | 00000000:00:0D.0 Off | 0 | | N/A 34C P0 25W / 250W | 0MiB / 32510MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Tesla V100S-PCI... Off | 00000000:00:0E.0 Off | 0 | | N/A 32C P0 24W / 250W | 0MiB / 32510MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 Tesla V100S-PCI... Off | 00000000:00:0F.0 Off | 0 | | N/A 33C P0 25W / 250W | 0MiB / 32510MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 Tesla V100S-PCI... Off | 00000000:00:10.0 Off | 0 | | N/A 34C P0 25W / 250W | 0MiB / 32510MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 4 Tesla V100S-PCI... Off | 00000000:00:11.0 Off | 0 | | N/A 33C P0 25W / 250W | 0MiB / 32510MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 5 Tesla V100S-PCI... Off | 00000000:00:12.0 Off | 0 | | N/A 34C P0 25W / 250W | 0MiB / 32510MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 6 Tesla V100S-PCI... Off | 00000000:00:13.0 Off | 0 | | N/A 35C P0 25W / 250W | 0MiB / 32510MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 7 Tesla V100S-PCI... Off | 00000000:00:14.0 Off | 0 | | N/A 33C P0 25W / 250W | 0MiB / 32510MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_May__3_18:49:52_PDT_2022 Cuda compilation tools, release 11.7, V11.7.64 Build cuda_11.7.r11.7/compiler.31294372_0 WARNING:torch.distributed.run:


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


[2024-02-06 00:36:46,618] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-02-06 00:36:46,619] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-02-06 00:36:46,624] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-02-06 00:36:46,647] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-02-06 00:36:46,648] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-02-06 00:36:46,705] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-02-06 00:36:46,719] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-02-06 00:36:46,752] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) /home/mentox/miniconda3/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( [2024-02-06 00:36:49,982] [INFO] [comm.py:637:init_distributed] cdb=None /home/mentox/miniconda3/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( [2024-02-06 00:36:50,025] [INFO] [comm.py:637:init_distributed] cdb=None /home/mentox/miniconda3/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( /home/mentox/miniconda3/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( [2024-02-06 00:36:50,185] [INFO] [comm.py:637:init_distributed] cdb=None [2024-02-06 00:36:50,191] [INFO] [comm.py:637:init_distributed] cdb=None /home/mentox/miniconda3/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( [2024-02-06 00:36:50,265] [INFO] [comm.py:637:init_distributed] cdb=None /home/mentox/miniconda3/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( /home/mentox/miniconda3/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( [2024-02-06 00:36:50,280] [INFO] [comm.py:637:init_distributed] cdb=None [2024-02-06 00:36:50,288] [INFO] [comm.py:637:init_distributed] cdb=None [2024-02-06 00:36:50,288] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl /home/mentox/miniconda3/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( [2024-02-06 00:36:50,402] [INFO] [comm.py:637:init_distributed] cdb=None CUDA extension not installed. CUDA extension not installed. CUDA extension not installed. CUDA extension not installed. CUDA extension not installed. CUDA extension not installed. CUDA extension not installed. CUDA extension not installed. CUDA extension not installed. CUDA extension not installed. CUDA extension not installed. CUDA extension not installed. CUDA extension not installed. CUDA extension not installed. CUDA extension not installed. CUDA extension not installed. Using disable_exllama is deprecated and will be removed in version 4.37. Use use_exllama instead and specify the version with exllama_config.The value of use_exllama will be overwritten by disable_exllama passed in GPTQConfig or stored in your config file. Using disable_exllama is deprecated and will be removed in version 4.37. Use use_exllama instead and specify the version with exllama_config.The value of use_exllama will be overwritten by disable_exllama passed in GPTQConfig or stored in your config file. Using disable_exllama is deprecated and will be removed in version 4.37. Use use_exllama instead and specify the version with exllama_config.The value of use_exllama will be overwritten by disable_exllama passed in GPTQConfig or stored in your config file. Using disable_exllama is deprecated and will be removed in version 4.37. Use use_exllama instead and specify the version with exllama_config.The value of use_exllama will be overwritten by disable_exllama passed in GPTQConfig or stored in your config file. Using disable_exllama is deprecated and will be removed in version 4.37. Use use_exllama instead and specify the version with exllama_config.The value of use_exllama will be overwritten by disable_exllama passed in GPTQConfig or stored in your config file. Using disable_exllama is deprecated and will be removed in version 4.37. Use use_exllama instead and specify the version with exllama_config.The value of use_exllama will be overwritten by disable_exllama passed in GPTQConfig or stored in your config file. Using disable_exllama is deprecated and will be removed in version 4.37. Use use_exllama instead and specify the version with exllama_config.The value of use_exllama will be overwritten by disable_exllama passed in GPTQConfig or stored in your config file. Using disable_exllama is deprecated and will be removed in version 4.37. Use use_exllama instead and specify the version with exllama_config.The value of use_exllama will be overwritten by disable_exllama passed in GPTQConfig or stored in your config file. Try importing flash-attention for faster inference... Try importing flash-attention for faster inference... Try importing flash-attention for faster inference... Try importing flash-attention for faster inference... Try importing flash-attention for faster inference... Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Try importing flash-attention for faster inference... Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Loading checkpoint shards: 0%| | 0/21 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/21 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/21 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/21 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/21 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/21 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/21 [00:00<?, ?it/s] Loading checkpoint shards: 5%|▍ | 1/21 [00:00<00:04, 4.18it/s] Loading checkpoint shards: 5%|▍ | 1/21 [00:00<00:03, 5.47it/s] Loading checkpoint shards: 5%|▍ | 1/21 [00:00<00:03, 5.56it/s] Loading checkpoint shards: 5%|▍ | 1/21 [00:00<00:03, 5.36it/s] Loading checkpoint shards: 5%|▍ | 1/21 [00:00<00:03, 5.41it/s] Loading checkpoint shards: 5%|▍ | 1/21 [00:00<00:03, 5.50it/s] Loading checkpoint shards: 5%|▍ | 1/21 [00:00<00:03, 5.44it/s] Loading checkpoint shards: 0%| | 0/21 [00:00<?, ?it/s] Loading checkpoint shards: 5%|▍ | 1/21 [00:00<00:04, 4.94it/s] Loading checkpoint shards: 10%|▉ | 2/21 [00:02<00:22, 1.16s/it] Loading checkpoint shards: 10%|▉ | 2/21 [00:02<00:22, 1.19s/it] Loading checkpoint shards: 10%|▉ | 2/21 [00:02<00:21, 1.16s/it] Loading checkpoint shards: 10%|▉ | 2/21 [00:02<00:24, 1.29s/it] Loading checkpoint shards: 10%|▉ | 2/21 [00:02<00:24, 1.27s/it] Loading checkpoint shards: 10%|▉ | 2/21 [00:02<00:24, 1.26s/it] Loading checkpoint shards: 10%|▉ | 2/21 [00:02<00:22, 1.21s/it] Loading checkpoint shards: 10%|▉ | 2/21 [00:01<00:20, 1.05s/it] Loading checkpoint shards: 14%|█▍ | 3/21 [00:04<00:35, 1.96s/it] Loading checkpoint shards: 14%|█▍ | 3/21 [00:04<00:34, 1.91s/it] Loading checkpoint shards: 14%|█▍ | 3/21 [00:04<00:34, 1.93s/it] Loading checkpoint shards: 14%|█▍ | 3/21 [00:05<00:35, 1.98s/it] Loading checkpoint shards: 14%|█▍ | 3/21 [00:04<00:33, 1.84s/it] Loading checkpoint shards: 14%|█▍ | 3/21 [00:04<00:34, 1.93s/it] Loading checkpoint shards: 14%|█▍ | 3/21 [00:05<00:35, 1.97s/it] Loading checkpoint shards: 14%|█▍ | 3/21 [00:04<00:34, 1.92s/it] Loading checkpoint shards: 19%|█▉ | 4/21 [00:07<00:34, 2.01s/it] Loading checkpoint shards: 19%|█▉ | 4/21 [00:07<00:34, 2.02s/it] Loading checkpoint shards: 19%|█▉ | 4/21 [00:06<00:33, 1.99s/it] Loading checkpoint shards: 19%|█▉ | 4/21 [00:06<00:33, 1.99s/it] Loading checkpoint shards: 19%|█▉ | 4/21 [00:06<00:33, 1.94s/it] Loading checkpoint shards: 19%|█▉ | 4/21 [00:06<00:33, 1.99s/it] Loading checkpoint shards: 19%|█▉ | 4/21 [00:07<00:34, 2.00s/it] Loading checkpoint shards: 19%|█▉ | 4/21 [00:07<00:34, 2.03s/it] Loading checkpoint shards: 24%|██▍ | 5/21 [00:09<00:37, 2.32s/it] Loading checkpoint shards: 24%|██▍ | 5/21 [00:09<00:36, 2.31s/it] Loading checkpoint shards: 24%|██▍ | 5/21 [00:09<00:37, 2.31s/it] Loading checkpoint shards: 24%|██▍ | 5/21 [00:10<00:37, 2.33s/it] Loading checkpoint shards: 24%|██▍ | 5/21 [00:09<00:37, 2.33s/it] Loading checkpoint shards: 24%|██▍ | 5/21 [00:09<00:37, 2.32s/it] Loading checkpoint shards: 24%|██▍ | 5/21 [00:09<00:37, 2.31s/it] Loading checkpoint shards: 24%|██▍ | 5/21 [00:09<00:36, 2.29s/it] Loading checkpoint shards: 29%|██▊ | 6/21 [00:12<00:38, 2.54s/it] Loading checkpoint shards: 29%|██▊ | 6/21 [00:12<00:38, 2.54s/it] Loading checkpoint shards: 29%|██▊ | 6/21 [00:12<00:38, 2.54s/it] Loading checkpoint shards: 29%|██▊ | 6/21 [00:13<00:38, 2.56s/it] Loading checkpoint shards: 29%|██▊ | 6/21 [00:12<00:38, 2.54s/it] Loading checkpoint shards: 29%|██▊ | 6/21 [00:12<00:37, 2.52s/it] Loading checkpoint shards: 29%|██▊ | 6/21 [00:12<00:38, 2.56s/it] Loading checkpoint shards: 29%|██▊ | 6/21 [00:12<00:38, 2.56s/it] Loading checkpoint shards: 33%|███▎ | 7/21 [00:15<00:34, 2.45s/it] Loading checkpoint shards: 33%|███▎ | 7/21 [00:15<00:34, 2.44s/it] Loading checkpoint shards: 33%|███▎ | 7/21 [00:15<00:34, 2.44s/it] Loading checkpoint shards: 33%|███▎ | 7/21 [00:15<00:34, 2.44s/it] Loading checkpoint shards: 33%|███▎ | 7/21 [00:15<00:34, 2.45s/it] Loading checkpoint shards: 33%|███▎ | 7/21 [00:15<00:34, 2.44s/it] Loading checkpoint shards: 33%|███▎ | 7/21 [00:14<00:33, 2.43s/it] Loading checkpoint shards: 33%|███▎ | 7/21 [00:15<00:34, 2.45s/it] Loading checkpoint shards: 38%|███▊ | 8/21 [00:17<00:32, 2.54s/it] Loading checkpoint shards: 38%|███▊ | 8/21 [00:17<00:33, 2.54s/it] Loading checkpoint shards: 38%|███▊ | 8/21 [00:17<00:32, 2.54s/it] Loading checkpoint shards: 38%|███▊ | 8/21 [00:18<00:33, 2.54s/it] Loading checkpoint shards: 38%|███▊ | 8/21 [00:17<00:32, 2.54s/it] Loading checkpoint shards: 38%|███▊ | 8/21 [00:17<00:32, 2.54s/it] Loading checkpoint shards: 38%|███▊ | 8/21 [00:17<00:33, 2.54s/it] Loading checkpoint shards: 38%|███▊ | 8/21 [00:17<00:32, 2.53s/it] Loading checkpoint shards: 43%|████▎ | 9/21 [00:20<00:29, 2.43s/it] Loading checkpoint shards: 43%|████▎ | 9/21 [00:19<00:29, 2.43s/it] Loading checkpoint shards: 43%|████▎ | 9/21 [00:20<00:29, 2.43s/it] Loading checkpoint shards: 43%|████▎ | 9/21 [00:20<00:29, 2.43s/it] Loading checkpoint shards: 43%|████▎ | 9/21 [00:20<00:29, 2.43s/it] Loading checkpoint shards: 43%|████▎ | 9/21 [00:19<00:29, 2.43s/it] Loading checkpoint shards: 43%|████▎ | 9/21 [00:20<00:29, 2.43s/it] Loading checkpoint shards: 43%|████▎ | 9/21 [00:19<00:29, 2.42s/it] Loading checkpoint shards: 48%|████▊ | 10/21 [00:23<00:28, 2.62s/it] Loading checkpoint shards: 48%|████▊ | 10/21 [00:23<00:28, 2.62s/it] Loading checkpoint shards: 48%|████▊ | 10/21 [00:23<00:28, 2.62s/it] Loading checkpoint shards: 48%|████▊ | 10/21 [00:23<00:28, 2.62s/it] Loading checkpoint shards: 48%|████▊ | 10/21 [00:23<00:28, 2.62s/it] Loading checkpoint shards: 48%|████▊ | 10/21 [00:23<00:28, 2.62s/it] Loading checkpoint shards: 48%|████▊ | 10/21 [00:23<00:28, 2.62s/it] Loading checkpoint shards: 48%|████▊ | 10/21 [00:22<00:28, 2.61s/it] Loading checkpoint shards: 52%|█████▏ | 11/21 [00:25<00:24, 2.42s/it] Loading checkpoint shards: 52%|█████▏ | 11/21 [00:25<00:24, 2.43s/it] Loading checkpoint shards: 52%|█████▏ | 11/21 [00:25<00:24, 2.42s/it] Loading checkpoint shards: 52%|█████▏ | 11/21 [00:25<00:24, 2.42s/it] Loading checkpoint shards: 52%|█████▏ | 11/21 [00:25<00:24, 2.43s/it] Loading checkpoint shards: 52%|█████▏ | 11/21 [00:25<00:24, 2.43s/it] Loading checkpoint shards: 52%|█████▏ | 11/21 [00:24<00:24, 2.42s/it] Loading checkpoint shards: 52%|█████▏ | 11/21 [00:24<00:24, 2.42s/it] Loading checkpoint shards: 57%|█████▋ | 12/21 [00:27<00:23, 2.58s/it] Loading checkpoint shards: 57%|█████▋ | 12/21 [00:28<00:23, 2.58s/it] Loading checkpoint shards: 57%|█████▋ | 12/21 [00:28<00:23, 2.58s/it] Loading checkpoint shards: 57%|█████▋ | 12/21 [00:28<00:23, 2.58s/it] Loading checkpoint shards: 57%|█████▋ | 12/21 [00:27<00:23, 2.58s/it] Loading checkpoint shards: 57%|█████▋ | 12/21 [00:27<00:23, 2.58s/it] Loading checkpoint shards: 57%|█████▋ | 12/21 [00:27<00:23, 2.58s/it] Loading checkpoint shards: 57%|█████▋ | 12/21 [00:28<00:23, 2.60s/it] Loading checkpoint shards: 62%|██████▏ | 13/21 [00:31<00:21, 2.66s/it] Loading checkpoint shards: 62%|██████▏ | 13/21 [00:30<00:21, 2.66s/it] Loading checkpoint shards: 62%|██████▏ | 13/21 [00:30<00:21, 2.67s/it] Loading checkpoint shards: 62%|██████▏ | 13/21 [00:30<00:21, 2.67s/it] Loading checkpoint shards: 62%|██████▏ | 13/21 [00:30<00:21, 2.67s/it] Loading checkpoint shards: 62%|██████▏ | 13/21 [00:30<00:21, 2.67s/it] Loading checkpoint shards: 62%|██████▏ | 13/21 [00:30<00:21, 2.66s/it] Loading checkpoint shards: 62%|██████▏ | 13/21 [00:30<00:21, 2.68s/it] Loading checkpoint shards: 67%|██████▋ | 14/21 [00:33<00:17, 2.55s/it] Loading checkpoint shards: 67%|██████▋ | 14/21 [00:33<00:17, 2.56s/it] Loading checkpoint shards: 67%|██████▋ | 14/21 [00:33<00:17, 2.56s/it] Loading checkpoint shards: 67%|██████▋ | 14/21 [00:33<00:17, 2.55s/it] Loading checkpoint shards: 67%|██████▋ | 14/21 [00:33<00:17, 2.56s/it] Loading checkpoint shards: 67%|██████▋ | 14/21 [00:33<00:17, 2.56s/it] Loading checkpoint shards: 67%|██████▋ | 14/21 [00:32<00:17, 2.56s/it] Loading checkpoint shards: 67%|██████▋ | 14/21 [00:33<00:17, 2.56s/it] Loading checkpoint shards: 71%|███████▏ | 15/21 [00:35<00:15, 2.60s/it] Loading checkpoint shards: 71%|███████▏ | 15/21 [00:35<00:15, 2.60s/it] Loading checkpoint shards: 71%|███████▏ | 15/21 [00:35<00:15, 2.60s/it] Loading checkpoint shards: 71%|███████▏ | 15/21 [00:35<00:15, 2.60s/it] Loading checkpoint shards: 71%|███████▏ | 15/21 [00:35<00:15, 2.60s/it] Loading checkpoint shards: 71%|███████▏ | 15/21 [00:36<00:15, 2.60s/it] Loading checkpoint shards: 71%|███████▏ | 15/21 [00:35<00:15, 2.60s/it] Loading checkpoint shards: 71%|███████▏ | 15/21 [00:35<00:15, 2.60s/it] Loading checkpoint shards: 76%|███████▌ | 16/21 [00:38<00:12, 2.50s/it] Loading checkpoint shards: 76%|███████▌ | 16/21 [00:38<00:12, 2.50s/it] Loading checkpoint shards: 76%|███████▌ | 16/21 [00:38<00:12, 2.50s/it] Loading checkpoint shards: 76%|███████▌ | 16/21 [00:38<00:12, 2.50s/it] Loading checkpoint shards: 76%|███████▌ | 16/21 [00:38<00:12, 2.50s/it] Loading checkpoint shards: 76%|███████▌ | 16/21 [00:38<00:12, 2.50s/it] Loading checkpoint shards: 76%|███████▌ | 16/21 [00:38<00:12, 2.50s/it] Loading checkpoint shards: 76%|███████▌ | 16/21 [00:37<00:12, 2.50s/it] Loading checkpoint shards: 81%|████████ | 17/21 [00:41<00:10, 2.63s/it] Loading checkpoint shards: 81%|████████ | 17/21 [00:41<00:10, 2.63s/it] Loading checkpoint shards: 81%|████████ | 17/21 [00:41<00:10, 2.63s/it] Loading checkpoint shards: 81%|████████ | 17/21 [00:41<00:10, 2.63s/it] Loading checkpoint shards: 81%|████████ | 17/21 [00:41<00:10, 2.63s/it] Loading checkpoint shards: 81%|████████ | 17/21 [00:41<00:10, 2.63s/it] Loading checkpoint shards: 81%|████████ | 17/21 [00:41<00:10, 2.63s/it] Loading checkpoint shards: 81%|████████ | 17/21 [00:40<00:10, 2.63s/it] Loading checkpoint shards: 86%|████████▌ | 18/21 [00:43<00:08, 2.68s/it] Loading checkpoint shards: 86%|████████▌ | 18/21 [00:43<00:08, 2.68s/it] Loading checkpoint shards: 86%|████████▌ | 18/21 [00:43<00:08, 2.67s/it] Loading checkpoint shards: 86%|████████▌ | 18/21 [00:43<00:08, 2.68s/it] Loading checkpoint shards: 86%|████████▌ | 18/21 [00:43<00:08, 2.68s/it] Loading checkpoint shards: 86%|████████▌ | 18/21 [00:43<00:08, 2.67s/it] Loading checkpoint shards: 86%|████████▌ | 18/21 [00:44<00:08, 2.68s/it] Loading checkpoint shards: 86%|████████▌ | 18/21 [00:43<00:08, 2.68s/it] Loading checkpoint shards: 90%|█████████ | 19/21 [00:46<00:05, 2.56s/it] Loading checkpoint shards: 90%|█████████ | 19/21 [00:46<00:05, 2.56s/it] Loading checkpoint shards: 90%|█████████ | 19/21 [00:46<00:05, 2.56s/it] Loading checkpoint shards: 90%|█████████ | 19/21 [00:46<00:05, 2.56s/it] Loading checkpoint shards: 90%|█████████ | 19/21 [00:46<00:05, 2.56s/it] Loading checkpoint shards: 90%|█████████ | 19/21 [00:46<00:05, 2.56s/it] Loading checkpoint shards: 90%|█████████ | 19/21 [00:46<00:05, 2.56s/it] Loading checkpoint shards: 90%|█████████ | 19/21 [00:45<00:05, 2.57s/it] Loading checkpoint shards: 95%|█████████▌| 20/21 [00:47<00:02, 2.37s/it] Loading checkpoint shards: 95%|█████████▌| 20/21 [00:48<00:02, 2.37s/it] Loading checkpoint shards: 95%|█████████▌| 20/21 [00:48<00:02, 2.37s/it] Loading checkpoint shards: 95%|█████████▌| 20/21 [00:48<00:02, 2.37s/it] Loading checkpoint shards: 95%|█████████▌| 20/21 [00:48<00:02, 2.37s/it] Loading checkpoint shards: 95%|█████████▌| 20/21 [00:48<00:02, 2.37s/it] Loading checkpoint shards: 95%|█████████▌| 20/21 [00:48<00:02, 2.37s/it] Loading checkpoint shards: 95%|█████████▌| 20/21 [00:48<00:02, 2.37s/it] Loading checkpoint shards: 100%|██████████| 21/21 [00:47<00:00, 1.70s/it] Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 1.71s/it] Loading checkpoint shards: 100%|██████████| 21/21 [00:47<00:00, 2.28s/it] Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 2.30s/it] Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 1.71s/it] Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 1.71s/it] Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 1.71s/it] Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 2.29s/it] Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 2.30s/it] Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 2.30s/it]

Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 1.71s/it] Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 1.71s/it] Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 2.29s/it] Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 1.71s/it] Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 2.30s/it] Loading checkpoint shards: 100%|██████████| 21/21 [00:48<00:00, 2.30s/it] trainable params: 754,974,720 || all params: 3,247,710,208 || trainable%: 23.246369646537133 Loading data... trainable params: 754,974,720 || all params: 3,247,710,208 || trainable%: 23.246369646537133 trainable params: 754,974,720 || all params: 3,247,710,208 || trainable%: 23.246369646537133 trainable params: 754,974,720 || all params: 3,247,710,208 || trainable%: 23.246369646537133 trainable params: 754,974,720 || all params: 3,247,710,208 || trainable%: 23.246369646537133 trainable params: 754,974,720 || all params: 3,247,710,208 || trainable%: 23.246369646537133 trainable params: 754,974,720 || all params: 3,247,710,208 || trainable%: 23.246369646537133 trainable params: 754,974,720 || all params: 3,247,710,208 || trainable%: 23.246369646537133 Formatting inputs...Skip in lazy mode Detected kernel version 4.18.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing in your model. You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing in your model. You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing in your model. You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing in your model. You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing in your model. You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing in your model. You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing in your model. You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing in your model. Traceback (most recent call last): File "/home/mentox/project/qwen_72b_int4/finetune.py", line 374, in Traceback (most recent call last): File "/home/mentox/project/qwen_72b_int4/finetune.py", line 374, in train() File "/home/mentox/project/qwen_72b_int4/finetune.py", line 367, in train Traceback (most recent call last): File "/home/mentox/project/qwen_72b_int4/finetune.py", line 374, in Traceback (most recent call last): File "/home/mentox/project/qwen_72b_int4/finetune.py", line 374, in train() File "/home/mentox/project/qwen_72b_int4/finetune.py", line 367, in train trainer.train() File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1539, in train trainer.train() train() File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1539, in train Traceback (most recent call last): File "/home/mentox/project/qwen_72b_int4/finetune.py", line 374, in File "/home/mentox/project/qwen_72b_int4/finetune.py", line 367, in train train() File "/home/mentox/project/qwen_72b_int4/finetune.py", line 367, in train trainer.train() trainer.train() File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1539, in train

File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop( ^^^^^^^^^^^^^^^Traceback (most recent call last): ^^^^^ train() File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1690, in _inner_training_loop

File "/home/mentox/project/qwen_72b_int4/finetune.py", line 374, in Traceback (most recent call last): File "/home/mentox/project/qwen_72b_int4/finetune.py", line 367, in train File "/home/mentox/project/qwen_72b_int4/finetune.py", line 374, in return inner_training_loop( ^^^^^^^^^^^^^^^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1690, in _inner_training_loop trainer.train() File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop(return inner_training_loop(

                  ^^    ^^^train()^^^

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/home/mentox/project/qwen_72b_int4/finetune.py", line 367, in train File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1690, in _inner_training_loop File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1690, in _inner_training_loop model, self.optimizer, self.lr_scheduler = self.accelerator.prepare( train() File "/home/mentox/project/qwen_72b_int4/finetune.py", line 367, in train ^^^^^^^^^^^^^^^^^^^^^^^ ^trainer.train()^

File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1219, in prepare File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1539, in train model, self.optimizer, self.lr_scheduler = self.accelerator.prepare( trainer.train() File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1539, in train ^^return inner_training_loop(^ ^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^ ^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1219, in prepare ^^^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1690, in _inner_training_loop model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(

                                       result = self._prepare_deepspeed(*args)  
                                                        ^  ^^ ^^ ^^ ^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

^ ^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1219, in prepare ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1219, in prepare ^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1604, in _prepare_deepspeed return inner_training_loop( ^^^^^^^^^^^^ ^^result = self._prepare_deepspeed(*args)^ ^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1690, in _inner_training_loop return inner_training_loop( ^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^model, self.optimizer, self.lr_scheduler = self.accelerator.prepare( ^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1604, in _prepare_deepspeed File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1690, in _inner_training_loop result = self._prepare_deepspeed(*args)
result = self._prepare_deepspeed(*args) ^^^^^^^^^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs) ^^
^
^ ^ ^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1219, in prepare ^^^^^^^^^^^^^^^^^^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1604, in _prepare_deepspeed File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1604, in _prepare_deepspeed ^^^^^^ ^model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(^ ^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/init.py", line 171, in initialize engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs) ^^^^^^^^^^^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ model, self.optimizer, self.lr_scheduler = self.accelerator.prepare( File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1219, in prepare result = self._prepare_deepspeed(*args)
engine = DeepSpeedEngine(args=args,
^^ ^^ ^ ^ ^^ ^ ^ ^ ^ ^^^ ^^ ^^^ ^^^ ^^^ ^^ ^ ^ engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)^^ ^^ ^ ^^^ ^engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)^ ^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^ ^^^^ ^^^^^^ ^^ ^^^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 263, in init ^ ^^ ^Traceback (most recent call last): ^^ ^ ^^ File "/home/mentox/project/qwen_72b_int4/finetune.py", line 374, in ^^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/init.py", line 171, in initialize ^ ^ ^^ ^^ ^ ^^ ^ ^result = self._prepare_deepspeed(*args)

^
File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1604, in _prepare_deepspeed File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1219, in prepare ^ ^ ^ ^ ^ ^ ^^ ^^ ^^ ^^ ^^^^^ ^^^^^^self._configure_distributed_model(model)^^ ^^^^^^^^^^^^^^^^^^^^ ^^^^engine = DeepSpeedEngine(args=args,^^^^^ ^^^^^^^^^^ ^^^ ^^^ ^^ ^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1103, in _configure_distributed_model ^ ^^ ^^^ ^^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/init.py", line 171, in initialize

^ ^ ^ ^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/init.py", line 171, in initialize ^ ^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1604, in _prepare_deepspeed ^^^^^^^^^^^^^^^^^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 263, in init engine = DeepSpeedEngine(args=args,
engine = DeepSpeedEngine(args=args, train()
^ ^ ^^^^^^^^^^^^^^ ^^self._configure_distributed_model(model)^^^ ^^^^^^^^^^^ File "/home/mentox/project/qwen_72b_int4/finetune.py", line 367, in train ^ ^^result = self._prepare_deepspeed(*args)^^ ^^^^^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1103, in _configure_distributed_model ^^ ^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 263, in init File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 263, in init self.module.to(self.device)
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs) ^^^^^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1145, in to ^^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1604, in _prepare_deepspeed self._configure_distributed_model(model) self._configure_distributed_model(model)trainer.train()

               File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1539, in train

File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1103, in _configure_distributed_model ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1103, in _configure_distributed_model ^^^^^^^^^^^ ^engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)^^ ^^^^^^^^^^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/init.py", line 171, in initialize self.module.to(self.device) File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1145, in to ^^^^^^^^^^^^^^^^^^^ ^engine = DeepSpeedEngine(args=args,^ ^^^ ^ return self._apply(convert)^ ^ ^ ^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/init.py", line 171, in initialize ^ ^ ^ ^ ^^ ^ ^ ^ ^ ^^^^^^ ^^^self.module.to(self.device)^
^^self.module.to(self.device)^^ ^^^^^^^^^^^^^^^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1145, in to ^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 263, in init ^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1145, in to ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
engine = DeepSpeedEngine(args=args, ^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^return self._apply(convert) ^ ^ ^ ^ ^ ^ return inner_training_loop( ^ ^ ^ ^ self._configure_distributed_model(model) ^

      File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 263, in __init__
            ^    File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1103, in _configure_distributed_model

^ ^ ^ ^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^module._apply(fn)^^^

^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1690, in _inner_training_loop ^^^^^^^^ ^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply return self._apply(convert)^^ ^^ ^^^return self._apply(convert)
^ self._configure_distributed_model(model)

File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/init.py", line 171, in initialize File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply ^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1103, in _configure_distributed_model ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply ^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply engine = DeepSpeedEngine(args=args, self.module.to(self.device) ^^^^^^^^^^ ^module._apply(fn)^ ^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1145, in to ^^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply ^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 263, in init module._apply(fn) File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) self.module.to(self.device)
module._apply(fn) File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1145, in to self._configure_distributed_model(model) File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1103, in _configure_distributed_model model, self.optimizer, self.lr_scheduler = self.accelerator.prepare( module._apply(fn) [Previous line repeated 5 more times] File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 844, in _apply return self._apply(convert)
module._apply(fn)
^^^^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) ^ ^ ^ ^^ ^ ^ ^ ^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply ^^^^^^^^ ^^module._apply(fn)^^ ^^^^^^^^ ^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1219, in prepare File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply ^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply return self._apply(convert) self._buffers[key] = fn(buf) self.module.to(self.device) ^ ^ ^ ^ ^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1145, in to ^ ^ ^ module._apply(fn)^ ^ ^ ^ ^ ^ ^^^module._apply(fn)^^^ ^^ [Previous line repeated 5 more times] ^^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 844, in _apply ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1143, in convert [Previous line repeated 5 more times] File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 844, in _apply module._apply(fn)module._apply(fn)

[Previous line repeated 5 more times] File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 844, in _apply result = self._prepare_deepspeed(*args) module._apply(fn) self._buffers[key] = fn(buf)

 self._buffers[key] = fn(buf) 
  ^^^^^  File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply

^ ^ return self._apply(convert) ^ ^ return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)^
^ ^ ^ ^ ^ ^ module._apply(fn) ^ ^
^ ^ ^ ^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply self._buffers[key] = fn(buf)^
^ ^ ^ ^ ^ ^ ^^ ^ ^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1604, in _prepare_deepspeed ^ ^^ ^^^ ^^^^^ ^^^^^ ^^^^^ ^ ^^^^^ ^^^^^ ^^

^ ^ ^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1143, in convert ^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1143, in convert ^^ ^^ module._apply(fn) ^ ^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply ^ ^ ^ ^ ^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply ^^^^^^^^^^^^^^ ^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1143, in convert ^^^^^^^^^^^^ ^module._apply(fn)^ ^^^^^^^^^^^ [Previous line repeated 5 more times] ^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 844, in _apply ^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 1; 31.75 GiB total capacity; 30.97 GiB already allocated; 61.50 MiB free; 30.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF module._apply(fn)module._apply(fn)

  [Previous line repeated 5 more times]

File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 844, in _apply

return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) self._buffers[key] = fn(buf)engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

                   ^^ ^^ ^^ ^^ ^^ ^^ ^ ^ ^^  ^^ ^ ^^  ^   ^ ^  ^ ^^  ^  ^  ^ ^ ^   ^^    ^^   ^^   ^^ ^^  ^^ ^ ^^^  ^ ^^ ^ ^^^ ^^     ^ ^^ ^^ module._apply(fn)^^    self._buffers[key] = fn(buf)^^ 

^^ ^^^^ ^^ ^^^ ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply ^^ ^^ ^^ ^ ^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1143, in convert ^ ^^^ ^^ ^^ ^ ^ ^^ ^^^ ^^^ ^ ^ ^^ ^^^ ^^ ^^^ ^^ ^^^ ^^^ ^^^^^ ^^^^^^ ^^^^^ ^^^^ ^^^ ^^^^^^ ^^^^^ ^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1143, in convert ^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^ ^^^module._apply(fn) File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/init.py", line 171, in initialize ^^^ ^^^^^^^^^^^^^^^^^^^^^^^ [Previous line repeated 5 more times] ^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 844, in _apply ^^^^^ ^^^^^^torch.cuda^torch.cuda.^^.OutOfMemoryError^: OutOfMemoryError^^: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 2; 31.75 GiB total capacity; 30.97 GiB already allocated; 61.50 MiB free; 30.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF^^ CUDA out of memory. Tried to allocate 96.00 MiB (GPU 4; 31.75 GiB total capacity; 30.97 GiB already allocated; 61.50 MiB free; 30.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF^
^return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)^ ^^ torch.cuda.OutOfMemoryError : engine = DeepSpeedEngine(args=args, CUDA out of memory. Tried to allocate 96.00 MiB (GPU 3; 31.75 GiB total capacity; 30.97 GiB already allocated; 61.50 MiB free; 30.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^ ^return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 263, in init

^^^^^^^^^ ^^self._buffers[key] = fn(buf)^ ^^^^^^^^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^self._configure_distributed_model(model) ^^ ^^^^^^^^^^^^^^^^^^^^^ ^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1143, in convert ^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1103, in _configure_distributed_model ^^^^^^^torch.cuda^^.^OutOfMemoryError^^: ^CUDA out of memory. Tried to allocate 96.00 MiB (GPU 7; 31.75 GiB total capacity; 30.97 GiB already allocated; 61.50 MiB free; 30.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 0; 31.75 GiB total capacity; 30.97 GiB already allocated; 61.50 MiB free; 30.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF self.module.to(self.device)return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1145, in to ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 5; 31.75 GiB total capacity; 30.97 GiB already allocated; 61.50 MiB free; 30.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF return self._apply(convert) ^^^^^^^^^^^^^^^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) [Previous line repeated 5 more times] File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 844, in _apply self._buffers[key] = fn(buf) ^^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1143, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 6; 31.75 GiB total capacity; 30.97 GiB already allocated; 61.50 MiB free; 30.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 81) of binary: /home/mentox/miniconda3/bin/python Traceback (most recent call last): File "/home/mentox/miniconda3/bin/torchrun", line 8, in sys.exit(main()) ^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/distributed/run.py", line 794, in main run(args) File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/mentox/miniconda3/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

/home/mentox/project/qwen_72b_int4/finetune.py FAILED

Failures: [1]: time : 2024-02-06_00:40:59 host : job-6edb80eb-6e5a-4cc7-82c2-29ad70ed2112 rank : 1 (local_rank: 1) exitcode : 1 (pid: 82) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [2]: time : 2024-02-06_00:40:59 host : job-6edb80eb-6e5a-4cc7-82c2-29ad70ed2112 rank : 2 (local_rank: 2) exitcode : 1 (pid: 83) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [3]: time : 2024-02-06_00:40:59 host : job-6edb80eb-6e5a-4cc7-82c2-29ad70ed2112 rank : 3 (local_rank: 3) exitcode : 1 (pid: 84) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [4]: time : 2024-02-06_00:40:59 host : job-6edb80eb-6e5a-4cc7-82c2-29ad70ed2112 rank : 4 (local_rank: 4) exitcode : 1 (pid: 85) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [5]: time : 2024-02-06_00:40:59 host : job-6edb80eb-6e5a-4cc7-82c2-29ad70ed2112 rank : 5 (local_rank: 5) exitcode : 1 (pid: 86) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [6]: time : 2024-02-06_00:40:59 host : job-6edb80eb-6e5a-4cc7-82c2-29ad70ed2112 rank : 6 (local_rank: 6) exitcode : 1 (pid: 87) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [7]: time : 2024-02-06_00:40:59 host : job-6edb80eb-6e5a-4cc7-82c2-29ad70ed2112 rank : 7 (local_rank: 7) exitcode : 1 (pid: 88) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure): [0]: time : 2024-02-06_00:40:59 host : job-6edb80eb-6e5a-4cc7-82c2-29ad70ed2112 rank : 0 (local_rank: 0) exitcode : 1 (pid: 81) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

mikeleatila avatar Feb 08 '24 02:02 mikeleatila

调小 model_max_length 8192调到512,再一步步往上加

WangJianQ-0118 avatar Feb 21 '24 01:02 WangJianQ-0118

8卡A100都能全参finetune了呀

Yuxiang1995 avatar Mar 20 '24 09:03 Yuxiang1995

Adjust model_max_length down from 8192 to 512, and then increase it step by step.

@WangJianQ-cmd Thanks for your reply. Unfortunately I am still getting OOM even when I reduce model_max_length to values lower than 512 . The same applies for any other value larger than 512 . Thanks

mikeleatila avatar Mar 21 '24 17:03 mikeleatila

All 8-card A100 can participate in finetune.

@Yuxiang1995 Thanks for getting back to me! I only have 8 V100 not A100 . Do you think it would be still possible?

mikeleatila avatar Mar 21 '24 17:03 mikeleatila

All 8-card A100 can participate in finetune.

@Yuxiang1995 Thanks for getting back to me! I only have 8 V100 not A100 . Do you think it would be still possible?

Hi did you manage to solve. the problem? I encountered the same problem.

ff1Zzd avatar Apr 10 '24 11:04 ff1Zzd

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread. 此问题由于长期未有新进展而被系统自动标记为不活跃。如果您认为它仍有待解决,请在此帖下方留言以补充信息。

github-actions[bot] avatar May 11 '24 08:05 github-actions[bot]