CogVLM
CogVLM copied to clipboard
a trouble in finetune chatglm: ZeroDivisionError: integer division or modulo by zero
System Info / 系統信息
cuda:12.1 pytorch:2.3.1 python:3.10 gpu:4 a800(4*80g) ubuntu:22.04 apex is OK
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
- [X] The official example scripts / 官方的示例脚本
- [X] My own modified scripts / 我自己修改的脚本和任务
Reproduction / 复现过程
I use my own data to finetune it, I update the dataset.py.
my dataset.py: import os import logging import random import logging import jsonlines from io import BytesIO from PIL import Image from torch.utils.data import Dataset from sat.helpers import print_rank0 import json
captions_file = '/GLOBALFS/dhu_mbzhao_1/CogVLM-main/captions.json'
加载captions.json文件
with open(captions_file, 'r', encoding='utf-8') as file: captions = json.load(file)
定义函数,根据图片文件名查找并返回描述
def find_caption_by_filename(filename, captions_dict): # 检查文件名是否在captions_dict中 if filename in captions_dict: # 返回对应的描述 return captions_dict[filename] else: # 如果文件名不存在,返回None或一个错误消息 return None # 或者 "Description not found for this filename." def find_all_files(path, suffix=".jpg"): target_files = [] for cur_dir, _, files in os.walk(path, followlinks=True): for f in files: if f.endswith(suffix): target_files.append(os.path.join(cur_dir, f)) print_rank0(f'find {len(target_files)} files...') return target_files
class ItemDataset(Dataset): def init(self, image_processor, text_processor, args, data_dirs, cross_image_processor=None, **kwargs): super().init() self.data = self.load_data(data_dirs) self.image_processor, self.text_processor, self.cross_image_processor = image_processor, text_processor, cross_image_processor
def process_img(self, img):
img_dict = {'vision': self.image_processor(img)}
if self.cross_image_processor:
img_dict.update({'cross': self.cross_image_processor(img)})
return img_dict
def process_text(self, answer, prompt):
return self.text_processor(answer, prompt)
def load_data(self, data_dir):
all_files = find_all_files(data_dir, suffix=".jpg")
print_rank0(f"find {len(all_files)} samples in all...")
return all_files
def __len__(self):
return len(self.data)
def __getitem__(self, index):
data = self.data[index]
# img
try:
img = Image.open(data).convert('RGB')
except Exception as e:
print_rank0(e, level=logging.WARNING)
return {}
img_dict = self.process_img(img)
# text
#label = data.split('/')[-1].split('.')[0]
label = find_caption_by_filename(data, captions)
#uni_key = label #唯一id
uni_key = random.randint(0, 100000)#随机数代替,扩2倍
text_dict = self.process_text(label, "CLOTH:")
if text_dict is None:
print_rank0(f"Process text failed. Please check the max_target_length & max_source_length.\n The data is {data}", level=logging.WARNING)
return {}
# other attr
ret = {**img_dict, **text_dict, "question_id": uni_key}
return ret
my script: #! /bin/bash export PATH=/GLOBALFS/dhu_mbzhao_1/cuda/bin:$PATH export LD_LIBRARY_PATH=/GLOBALFS/dhu_mbzhao_1/cuda/lib64:$LD_LIBRARY_PATH
NUM_GPUS_PER_WORKER=4 MP_SIZE=1
script_path=$(realpath $0)
script_dir=$(dirname $script_path)
main_dir=$(dirname $script_dir)
MODEL_TYPE="cogvlm-chat-v1.1"
VERSION="base"
MODEL_ARGS="--from_pretrained $MODEL_TYPE
--max_length 1288
--lora_rank 10
--use_lora
--local_tokenizer /GLOBALFS/dhu_mbzhao_1/CogVLM-main/vicuna-7b-v1.5
--version $VERSION"
Tips: If training models of resolution 244, you can set --max_length smaller
OPTIONS_SAT="SAT_HOME=/GLOBALFS/dhu_mbzhao_1/CogVLM-main/.sat_models" OPTIONS_NCCL="NCCL_DEBUG=info NCCL_IB_DISABLE=0 NCCL_NET_GDR_LEVEL=2 LOCAL_WORLD_SIZE=$NUM_GPUS_PER_WORKER" HOST_FILE_PATH="hostfile"
train_data="./archive_split/train" valid_data="./archive_split/valid"
gpt_options="
--experiment-name finetune-$MODEL_TYPE
--model-parallel-size ${MP_SIZE}
--mode finetune
--train-iters 800
--resume-dataloader
$MODEL_ARGS
--train-data ${train_data}
--valid-data ${valid_data}
--distributed-backend nccl
--lr-decay-style cosine
--warmup .02
--checkpoint-activations
--vit_checkpoint_activations
--save-interval 200
--eval-interval 200
--save "./checkpoints"
--eval-iters 10
--eval-batch-size 1
--split 1.
--deepspeed_config test_config_bf16.json
--skip-init
--seed 2023
"
run_cmd="${OPTIONS_NCCL} ${OPTIONS_SAT} deepspeed --master_port 16666 --hostfile ${HOST_FILE_PATH} finetune_cogvlm_demo.py ${gpt_options}" echo ${run_cmd} eval ${run_cmd}
set +x
Below is the log:
(cogvlm) dhu_mbzhao_1@deeplearning-v191204-deeplearn:~/CogVLM-main/finetune_demo$ sh finetune_cogvlm_lora.sh
NCCL_DEBUG=info NCCL_IB_DISABLE=0 NCCL_NET_GDR_LEVEL=2 LOCAL_WORLD_SIZE=4 SAT_HOME=/GLOBALFS/dhu_mbzhao_1/CogVLM-main/.sat_models deepspeed --master_port 16666 --hostfile hostfile finetune_cogvlm_demo.py --experiment-name finetune-cogvlm-chat-v1.1 --model-parallel-size 1 --mode finetune --train-iters 800 --resume-dataloader --from_pretrained cogvlm-chat-v1.1 --max_length 1288 --lora_rank 10 --use_lora --local_tokenizer /GLOBALFS/dhu_mbzhao_1/CogVLM-main/vicuna-7b-v1.5 --version base --train-data ./archive_split/train --valid-data ./archive_split/valid --distributed-backend nccl --lr-decay-style cosine --warmup .02 --checkpoint-activations --vit_checkpoint_activations --save-interval 200 --eval-interval 200 --save ./checkpoints --eval-iters 10 --eval-batch-size 1 --split 1. --deepspeed_config test_config_bf16.json --skip-init --seed 2023
[2024-07-18 15:03:39,161] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible
[2024-07-18 15:03:40,797] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-07-18 15:03:40,797] [INFO] [runner.py:568:main] cmd = /GLOBALFS/dhu_mbzhao_1/anaconda3/envs/cogvlm/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgM119 --master_addr=127.0.0.1 --master_port=16666 --enable_each_rank_log=None finetune_cogvlm_demo.py --experiment-name finetune-cogvlm-chat-v1.1 --model-parallel-size 1 --mode finetune --train-iters 800 --resume-dataloader --from_pretrained cogvlm-chat-v1.1 --max_length 1288 --lora_rank 10 --use_lora --local_tokenizer /GLOBALFS/dhu_mbzhao_1/CogVLM-main/vicuna-7b-v1.5 --version base --train-data ./archive_split/train --valid-data ./archive_split/valid --distributed-backend nccl --lr-decay-style cosine --warmup .02 --checkpoint-activations --vit_checkpoint_activations --save-interval 200 --eval-interval 200 --save ./checkpoints --eval-iters 10 --eval-batch-size 1 --split 1. --deepspeed_config test_config_bf16.json --skip-init --seed 2023
[2024-07-18 15:03:42,018] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible
[2024-07-18 15:03:43,636] [INFO] [launch.py:139:main] 0 NCCL_DEBUG=info
[2024-07-18 15:03:43,636] [INFO] [launch.py:139:main] 0 NCCL_IB_DISABLE=0
[2024-07-18 15:03:43,636] [INFO] [launch.py:139:main] 0 NCCL_NET_GDR_LEVEL=2
[2024-07-18 15:03:43,636] [INFO] [launch.py:146:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3]}
[2024-07-18 15:03:43,636] [INFO] [launch.py:152:main] nnodes=1, num_local_procs=4, node_rank=0
[2024-07-18 15:03:43,636] [INFO] [launch.py:163:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3]})
[2024-07-18 15:03:43,636] [INFO] [launch.py:164:main] dist_world_size=4
[2024-07-18 15:03:43,636] [INFO] [launch.py:168:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3
[2024-07-18 15:03:43,637] [INFO] [launch.py:256:main] process 56061 spawned with command: ['/GLOBALFS/dhu_mbzhao_1/anaconda3/envs/cogvlm/bin/python', '-u', 'finetune_cogvlm_demo.py', '--local_rank=0', '--experiment-name', 'finetune-cogvlm-chat-v1.1', '--model-parallel-size', '1', '--mode', 'finetune', '--train-iters', '800', '--resume-dataloader', '--from_pretrained', 'cogvlm-chat-v1.1', '--max_length', '1288', '--lora_rank', '10', '--use_lora', '--local_tokenizer', '/GLOBALFS/dhu_mbzhao_1/CogVLM-main/vicuna-7b-v1.5', '--version', 'base', '--train-data', './archive_split/train', '--valid-data', './archive_split/valid', '--distributed-backend', 'nccl', '--lr-decay-style', 'cosine', '--warmup', '.02', '--checkpoint-activations', '--vit_checkpoint_activations', '--save-interval', '200', '--eval-interval', '200', '--save', './checkpoints', '--eval-iters', '10', '--eval-batch-size', '1', '--split', '1.', '--deepspeed_config', 'test_config_bf16.json', '--skip-init', '--seed', '2023']
[2024-07-18 15:03:43,637] [INFO] [launch.py:256:main] process 56062 spawned with command: ['/GLOBALFS/dhu_mbzhao_1/anaconda3/envs/cogvlm/bin/python', '-u', 'finetune_cogvlm_demo.py', '--local_rank=1', '--experiment-name', 'finetune-cogvlm-chat-v1.1', '--model-parallel-size', '1', '--mode', 'finetune', '--train-iters', '800', '--resume-dataloader', '--from_pretrained', 'cogvlm-chat-v1.1', '--max_length', '1288', '--lora_rank', '10', '--use_lora', '--local_tokenizer', '/GLOBALFS/dhu_mbzhao_1/CogVLM-main/vicuna-7b-v1.5', '--version', 'base', '--train-data', './archive_split/train', '--valid-data', './archive_split/valid', '--distributed-backend', 'nccl', '--lr-decay-style', 'cosine', '--warmup', '.02', '--checkpoint-activations', '--vit_checkpoint_activations', '--save-interval', '200', '--eval-interval', '200', '--save', './checkpoints', '--eval-iters', '10', '--eval-batch-size', '1', '--split', '1.', '--deepspeed_config', 'test_config_bf16.json', '--skip-init', '--seed', '2023']
[2024-07-18 15:03:43,637] [INFO] [launch.py:256:main] process 56063 spawned with command: ['/GLOBALFS/dhu_mbzhao_1/anaconda3/envs/cogvlm/bin/python', '-u', 'finetune_cogvlm_demo.py', '--local_rank=2', '--experiment-name', 'finetune-cogvlm-chat-v1.1', '--model-parallel-size', '1', '--mode', 'finetune', '--train-iters', '800', '--resume-dataloader', '--from_pretrained', 'cogvlm-chat-v1.1', '--max_length', '1288', '--lora_rank', '10', '--use_lora', '--local_tokenizer', '/GLOBALFS/dhu_mbzhao_1/CogVLM-main/vicuna-7b-v1.5', '--version', 'base', '--train-data', './archive_split/train', '--valid-data', './archive_split/valid', '--distributed-backend', 'nccl', '--lr-decay-style', 'cosine', '--warmup', '.02', '--checkpoint-activations', '--vit_checkpoint_activations', '--save-interval', '200', '--eval-interval', '200', '--save', './checkpoints', '--eval-iters', '10', '--eval-batch-size', '1', '--split', '1.', '--deepspeed_config', 'test_config_bf16.json', '--skip-init', '--seed', '2023']
[2024-07-18 15:03:43,638] [INFO] [launch.py:256:main] process 56064 spawned with command: ['/GLOBALFS/dhu_mbzhao_1/anaconda3/envs/cogvlm/bin/python', '-u', 'finetune_cogvlm_demo.py', '--local_rank=3', '--experiment-name', 'finetune-cogvlm-chat-v1.1', '--model-parallel-size', '1', '--mode', 'finetune', '--train-iters', '800', '--resume-dataloader', '--from_pretrained', 'cogvlm-chat-v1.1', '--max_length', '1288', '--lora_rank', '10', '--use_lora', '--local_tokenizer', '/GLOBALFS/dhu_mbzhao_1/CogVLM-main/vicuna-7b-v1.5', '--version', 'base', '--train-data', './archive_split/train', '--valid-data', './archive_split/valid', '--distributed-backend', 'nccl', '--lr-decay-style', 'cosine', '--warmup', '.02', '--checkpoint-activations', '--vit_checkpoint_activations', '--save-interval', '200', '--eval-interval', '200', '--save', './checkpoints', '--eval-iters', '10', '--eval-batch-size', '1', '--split', '1.', '--deepspeed_config', 'test_config_bf16.json', '--skip-init', '--seed', '2023']
[2024-07-18 15:03:44,906] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-18 15:03:44,968] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-18 15:03:44,971] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-18 15:03:44,972] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible
[2024-07-18 15:03:49,192] [INFO] using world size: 4 and model-parallel size: 1
[2024-07-18 15:03:49,192] [INFO] > padded vocab (size: 100) with 28 dummy tokens (new size: 128)
[2024-07-18 15:03:49,192] [INFO] Will override arguments with manually specified deepspeed_config!
[2024-07-18 15:03:49,326] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-07-18 15:03:49,331] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-07-18 15:03:49,353] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-07-18 15:03:49,361] [INFO] [RANK 0] > initializing model parallel with size 1
[2024-07-18 15:03:49,363] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-07-18 15:03:49,366] [INFO] [checkpointing.py:1048:_configure_using_config_file] {'partition_activations': False, 'contiguous_memory_optimization': False, 'cpu_checkpointing': False, 'number_checkpoints': None, 'synchronize_checkpoint_boundary': False, 'profile': False}
[2024-07-18 15:03:49,369] [INFO] [checkpointing.py:229:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 4741 and data parallel seed: 2023
[2024-07-18 15:03:49,372] [INFO] [RANK 0] building FineTuneTrainCogVLMModel model ...
[2024-07-18 15:03:59,465] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 17639685376
[2024-07-18 15:04:54,090] [INFO] [RANK 0] global rank 0 is loading checkpoint /GLOBALFS/dhu_mbzhao_1/CogVLM-main/.sat_models/cogvlm-chat-v1.1/1/mp_rank_00_model_states.pt
[2024-07-18 15:05:43,077] [INFO] [RANK 0] > successfully loaded /GLOBALFS/dhu_mbzhao_1/CogVLM-main/.sat_models/cogvlm-chat-v1.1/1/mp_rank_00_model_states.pt
[2024-07-18 15:05:44,114] [INFO] [RANK 0] replacing layer 0 attention with lora
[2024-07-18 15:05:44,864] [INFO] [RANK 0] replacing layer 1 attention with lora
[2024-07-18 15:05:45,654] [INFO] [RANK 0] replacing layer 2 attention with lora
[2024-07-18 15:05:46,351] [INFO] [RANK 0] replacing layer 3 attention with lora
[2024-07-18 15:05:47,077] [INFO] [RANK 0] replacing layer 4 attention with lora
[2024-07-18 15:05:47,871] [INFO] [RANK 0] replacing layer 5 attention with lora
[2024-07-18 15:05:48,692] [INFO] [RANK 0] replacing layer 6 attention with lora
[2024-07-18 15:05:49,551] [INFO] [RANK 0] replacing layer 7 attention with lora
[2024-07-18 15:05:50,375] [INFO] [RANK 0] replacing layer 8 attention with lora
[2024-07-18 15:05:51,153] [INFO] [RANK 0] replacing layer 9 attention with lora
[2024-07-18 15:05:51,949] [INFO] [RANK 0] replacing layer 10 attention with lora
[2024-07-18 15:05:52,892] [INFO] [RANK 0] replacing layer 11 attention with lora
[2024-07-18 15:05:53,677] [INFO] [RANK 0] replacing layer 12 attention with lora
[2024-07-18 15:05:54,587] [INFO] [RANK 0] replacing layer 13 attention with lora
[2024-07-18 15:05:55,295] [INFO] [RANK 0] replacing layer 14 attention with lora
[2024-07-18 15:05:56,079] [INFO] [RANK 0] replacing layer 15 attention with lora
[2024-07-18 15:05:56,938] [INFO] [RANK 0] replacing layer 16 attention with lora
[2024-07-18 15:05:57,762] [INFO] [RANK 0] replacing layer 17 attention with lora
[2024-07-18 15:05:58,654] [INFO] [RANK 0] replacing layer 18 attention with lora
[2024-07-18 15:05:59,468] [INFO] [RANK 0] replacing layer 19 attention with lora
[2024-07-18 15:06:00,300] [INFO] [RANK 0] replacing layer 20 attention with lora
[2024-07-18 15:06:01,055] [INFO] [RANK 0] replacing layer 21 attention with lora
[2024-07-18 15:06:02,043] [INFO] [RANK 0] replacing layer 22 attention with lora
[2024-07-18 15:06:02,786] [INFO] [RANK 0] replacing layer 23 attention with lora
[2024-07-18 15:06:03,570] [INFO] [RANK 0] replacing layer 24 attention with lora
[2024-07-18 15:06:04,406] [INFO] [RANK 0] replacing layer 25 attention with lora
[2024-07-18 15:06:05,249] [INFO] [RANK 0] replacing layer 26 attention with lora
[2024-07-18 15:06:06,080] [INFO] [RANK 0] replacing layer 27 attention with lora
[2024-07-18 15:06:06,862] [INFO] [RANK 0] replacing layer 28 attention with lora
[2024-07-18 15:06:08,048] [INFO] [RANK 0] replacing layer 29 attention with lora
[2024-07-18 15:06:08,829] [INFO] [RANK 0] replacing layer 30 attention with lora
[2024-07-18 15:06:09,577] [INFO] [RANK 0] replacing layer 31 attention with lora
[2024-07-18 15:06:10,367] [INFO] [RANK 0] replacing layer 0 attention with lora
[2024-07-18 15:06:10,480] [INFO] [RANK 0] replacing layer 1 attention with lora
[2024-07-18 15:06:10,589] [INFO] [RANK 0] replacing layer 2 attention with lora
[2024-07-18 15:06:10,832] [INFO] [RANK 0] replacing layer 3 attention with lora
[2024-07-18 15:06:11,036] [INFO] [RANK 0] replacing layer 4 attention with lora
[2024-07-18 15:06:11,243] [INFO] [RANK 0] replacing layer 5 attention with lora
[2024-07-18 15:06:11,437] [INFO] [RANK 0] replacing layer 6 attention with lora
[2024-07-18 15:06:11,644] [INFO] [RANK 0] replacing layer 7 attention with lora
[2024-07-18 15:06:11,851] [INFO] [RANK 0] replacing layer 8 attention with lora
[2024-07-18 15:06:12,125] [INFO] [RANK 0] replacing layer 9 attention with lora
[2024-07-18 15:06:12,333] [INFO] [RANK 0] replacing layer 10 attention with lora
[2024-07-18 15:06:12,469] [INFO] [RANK 0] replacing layer 11 attention with lora
[2024-07-18 15:06:12,655] [INFO] [RANK 0] replacing layer 12 attention with lora
[2024-07-18 15:06:12,857] [INFO] [RANK 0] replacing layer 13 attention with lora
[2024-07-18 15:06:13,064] [INFO] [RANK 0] replacing layer 14 attention with lora
[2024-07-18 15:06:13,325] [INFO] [RANK 0] replacing layer 15 attention with lora
[2024-07-18 15:06:13,541] [INFO] [RANK 0] replacing layer 16 attention with lora
[2024-07-18 15:06:13,763] [INFO] [RANK 0] replacing layer 17 attention with lora
[2024-07-18 15:06:14,028] [INFO] [RANK 0] replacing layer 18 attention with lora
[2024-07-18 15:06:14,241] [INFO] [RANK 0] replacing layer 19 attention with lora
[2024-07-18 15:06:14,443] [INFO] [RANK 0] replacing layer 20 attention with lora
[2024-07-18 15:06:14,642] [INFO] [RANK 0] replacing layer 21 attention with lora
[2024-07-18 15:06:14,843] [INFO] [RANK 0] replacing layer 22 attention with lora
[2024-07-18 15:06:15,035] [INFO] [RANK 0] replacing layer 23 attention with lora
[2024-07-18 15:06:15,226] [INFO] [RANK 0] replacing layer 24 attention with lora
[2024-07-18 15:06:15,443] [INFO] [RANK 0] replacing layer 25 attention with lora
[2024-07-18 15:06:15,626] [INFO] [RANK 0] replacing layer 26 attention with lora
[2024-07-18 15:06:15,832] [INFO] [RANK 0] replacing layer 27 attention with lora
[2024-07-18 15:06:15,997] [INFO] [RANK 0] replacing layer 28 attention with lora
[2024-07-18 15:06:16,190] [INFO] [RANK 0] replacing layer 29 attention with lora
[2024-07-18 15:06:16,437] [INFO] [RANK 0] replacing layer 30 attention with lora
[2024-07-18 15:06:16,639] [INFO] [RANK 0] replacing layer 31 attention with lora
[2024-07-18 15:06:16,846] [INFO] [RANK 0] replacing layer 32 attention with lora
[2024-07-18 15:06:17,052] [INFO] [RANK 0] replacing layer 33 attention with lora
[2024-07-18 15:06:17,250] [INFO] [RANK 0] replacing layer 34 attention with lora
[2024-07-18 15:06:17,453] [INFO] [RANK 0] replacing layer 35 attention with lora
[2024-07-18 15:06:17,652] [INFO] [RANK 0] replacing layer 36 attention with lora
[2024-07-18 15:06:17,926] [INFO] [RANK 0] replacing layer 37 attention with lora
[2024-07-18 15:06:18,139] [INFO] [RANK 0] replacing layer 38 attention with lora
[2024-07-18 15:06:18,348] [INFO] [RANK 0] replacing layer 39 attention with lora
[2024-07-18 15:06:18,540] [INFO] [RANK 0] replacing layer 40 attention with lora
[2024-07-18 15:06:18,741] [INFO] [RANK 0] replacing layer 41 attention with lora
[2024-07-18 15:06:18,934] [INFO] [RANK 0] replacing layer 42 attention with lora
[2024-07-18 15:06:19,126] [INFO] [RANK 0] replacing layer 43 attention with lora
[2024-07-18 15:06:19,346] [INFO] [RANK 0] replacing layer 44 attention with lora
[2024-07-18 15:06:19,545] [INFO] [RANK 0] replacing layer 45 attention with lora
[2024-07-18 15:06:19,745] [INFO] [RANK 0] replacing layer 46 attention with lora
[2024-07-18 15:06:19,930] [INFO] [RANK 0] replacing layer 47 attention with lora
[2024-07-18 15:06:20,122] [INFO] [RANK 0] replacing layer 48 attention with lora
[2024-07-18 15:06:20,327] [INFO] [RANK 0] replacing layer 49 attention with lora
[2024-07-18 15:06:20,534] [INFO] [RANK 0] replacing layer 50 attention with lora
[2024-07-18 15:06:20,733] [INFO] [RANK 0] replacing layer 51 attention with lora
[2024-07-18 15:06:20,970] [INFO] [RANK 0] replacing layer 52 attention with lora
[2024-07-18 15:06:21,163] [INFO] [RANK 0] replacing layer 53 attention with lora
[2024-07-18 15:06:21,424] [INFO] [RANK 0] replacing layer 54 attention with lora
[2024-07-18 15:06:21,643] [INFO] [RANK 0] replacing layer 55 attention with lora
[2024-07-18 15:06:21,842] [INFO] [RANK 0] replacing layer 56 attention with lora
[2024-07-18 15:06:22,030] [INFO] [RANK 0] replacing layer 57 attention with lora
[2024-07-18 15:06:22,230] [INFO] [RANK 0] replacing layer 58 attention with lora
[2024-07-18 15:06:22,433] [INFO] [RANK 0] replacing layer 59 attention with lora
[2024-07-18 15:06:22,580] [INFO] [RANK 0] replacing layer 60 attention with lora
[2024-07-18 15:06:22,780] [INFO] [RANK 0] replacing layer 61 attention with lora
[2024-07-18 15:06:23,041] [INFO] [RANK 0] replacing layer 62 attention with lora
[2024-07-18 15:06:23,776] [INFO] [RANK 0] find 0 files...
[2024-07-18 15:06:23,776] [INFO] [RANK 0] find 0 samples in all...
[rank3]: Traceback (most recent call last):
[rank3]: File "/GLOBALFS/dhu_mbzhao_1/CogVLM-main/finetune_demo/finetune_cogvlm_demo.py", line 256, in
This place has a divisor of 0, I don't know how to solve it. Could someone can help me?
Expected behavior / 期待表现
finetune is OK.