MiniCPM-V
MiniCPM-V copied to clipboard
[BUG] Data fetch error - typo
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
- [X] 我已经搜索过FAQ | I have searched FAQ
当前行为 | Current Behavior
Data fetch error raised because of typo on https://github.com/OpenBMB/MiniCPM-V/blob/0e4ec319cf69c6d17b5aa714cbaec29276c84089/finetune/dataset.py#L383 conversation -> conversations
期望行为 | Expected Behavior
No response
复现方法 | Steps To Reproduce
No response
运行环境 | Environment
No response
备注 | Anything else?
No response
the same bug i met QAQ
hello,can you show me your train data json.
在使用finetune_lora.sh测试微调的时候,我也遇到了同样的问题。 麻烦帮忙看一下是哪里出现了问题,谢谢啦! 这是我的finetune_lora.sh设置 #!/bin/bash
GPUS_PER_NODE=1 NNODES=1 NODE_RANK=0 MASTER_ADDR=localhost MASTER_PORT=6001
MODEL="/work/MiniCPM-V-main/check_point/OpenBMB/MiniCPM-V-2_6-int4"
ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.
See the section for finetuning in README for more information.
DATA="/work/MiniCPM-V-main/minicpm_data/data/train.json" EVAL_DATA="/work/MiniCPM-V-main/minicpm_data/eval/eval.json" LLM_TYPE="minicpm"
if use openbmb/MiniCPM-V-2, please set LLM_TYPE=minicpm
#if use openbmb/MiniCPM-Llama3-V-2_5, please set LLM_TYPE=llama3 export NCCL_P2P_DISABLE=1 # export NCCL_IB_DISABLE=1 # MODEL_MAX_Length=1024 # if conduct multi-images sft, please set MODEL_MAX_Length=4096
DISTRIBUTED_ARGS="
--nproc_per_node $GPUS_PER_NODE
--nnodes $NNODES
--node_rank $NODE_RANK
--master_addr $MASTER_ADDR
--master_port $MASTER_PORT
"
torchrun $DISTRIBUTED_ARGS finetune.py
--model_name_or_path $MODEL
--llm_type $LLM_TYPE
--data_path $DATA
--eval_data_path $EVAL_DATA
--remove_unused_columns false
--label_names "labels"
--prediction_loss_only false
--bf16 false
--bf16_full_eval false
--fp16 true
--fp16_full_eval true
--do_train
--do_eval
--tune_vision false
--tune_llm false
--q_lora true
--use_lora true
--lora_target_modules "llm..*layers.\d+.self_attn.(q_proj|k_proj|v_proj|o_proj)"
--model_max_length $MODEL_MAX_Length
--max_slice_nums 9
--max_steps 10000
--eval_steps 1000
--output_dir output/output__lora
--logging_dir output/output_lora
--logging_strategy "steps"
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 1
--evaluation_strategy "steps"
--save_strategy "steps"
--save_steps 1000
--save_total_limit 10
--learning_rate 1e-6
--weight_decay 0.1
--adam_beta2 0.95
--warmup_ratio 0.01
--lr_scheduler_type "cosine"
--logging_steps 1
--gradient_checkpointing true
--deepspeed ds_config_zero3.json
--report_to "tensorboard" # wandb
以下是我的json文件:
train.json
conda环境下一些重要的包版本:
accelerate 0.30.1
deepspeed 0.14.4
mmengine 0.10.4
mmengine-lite 0.10.4
modelscope 1.17.1
modelscope-studio 0.4.0.9
more-itertools 10.1.0
mpmath 1.3.0
ms-opencompass 0.1.0
ms-swift 2.4.0
ms-vlmeval 0.0.7
opencv-python 4.10.0.84
opencv-python-headless 4.5.5.64
transformers 4.40.0
transformers-stream-generator 0.0.5
torch 2.1.2
torchscale 0.3.0
torchvision 0.16.0
@KeepFaithMe 你好,我在微调时也遇到了这个问题,如果我没记错的话,dataset.py和finetune.py中似乎还有其他地方进行修改(时间久远细节记不太清楚了),才能正常读取数据,建议结合报错信息进行排查
非常感谢您的回复
您看是不是将图片中的conversation改为conversations。因为我发现上文根本没有conversation这个变量
please try our new finetuning code
有人解决这个问题了吗,在线急求
The issue is still there, causing the "Data Fetch Error", in new finetuning code. If anybody fixed it please help.
怎么解决这个问题呢?求解决方案
please try our new finetuning code