[Bug] 微调internvl3.5_1B模型时,开启use_packed_ds但没有效果,num_samples均固定,且每条样本都达到最长token数量
Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Describe the bug
我在训练 internvl3.5_1B 时,开启了 use_packed_ds
训练shell脚本:/InternVL3_5/internvl_chat_gpt_oss/shell/internvl3_5_qwen3/internvl3_5_1b_sft.sh 对应的.py文件:/InternVL3_5/internvl_chat_gpt_oss/internvl/train/internvl_chat_finetune.py 训练任务是单图单轮对话
shell的配置如下:
NPROC_PER_NODE=${NPROC_PER_NODE:-4}
BATCH_SIZE=${BATCH_SIZE:-32}
PER_DEVICE_BATCH_SIZE=${PER_DEVICE_BATCH_SIZE:-1}
GRADIENT_ACC=$((BATCH_SIZE / PER_DEVICE_BATCH_SIZE / NPROC_PER_NODE))
...
--force_image_size 448
--max_dynamic_patch 6
--max_seq_length 16384
...
--gradient_checkpointing True
--group_by_length False
--dynamic_image_size True
--use_thumbnail True
--ps_version 'v2'
--use_custom_flash_attn False
--report_to "tensorboard"
--deepspeed "zero_stage1_config.json"
--use_packed_ds True
--num_images_expected 96
--max_packed_tokens 32768
--max_buffer_size 20 \
训练时发现即使设置 use_packed_ds 为 True,每次打包的样本数量固定为2,且每个样本的token数量固定为max_seq_length。
经排查,是 internvl_chat_finetune.py 文件里 multi_modal_get_item 函数内部在使用前处理函数preprocess_function的时候,少传入了一个 use_packed_ds 超参数,导致后续的样本都被pad到最长长度了
源代码
修改后的代码
有相同问题的同学可以参考这个修改一下
Reproduction
...
Environment
官方的conda环境
Error traceback