assert num_new_tags >= 0, f'Number of media: {num_media}, number of media_tags: {num_media_tags}'
Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
Internvl2-8b 多图多轮对话微调,发生错误
这是微调命令:
CUDA_VISIBLE_DEVICES=2,3,4,5 swift sft --model_type internvl2-8b --model_id_or_path /data/swift/models/InternVL2-8B --dataset /data/swift/model_vl_lrr/dataset/sample.jsonl --num_train_epochs 2 --max_length 9216
这是微调数据格式
{"query": "Image-1: <image>\nImage-2: <image>\nDescribe the two images in detail.", "response": "xxxxxxxxx", "history": [["<image>Describe the image", "xxxxxxx"], ["CCCCC", "DDDDD"]], "images": ["image_path1", "image_path2", "image_path3"]}
参考如上的格式,针对具体的场景做了一些变化,sample.jsonl 的内容如下:
{"query": "Image-1: <image>\nImage-2: <image>\nImage-3: <image>\nImage-4: <image>\nImage-5: <image>\n问题1", "response": "回答1", "history": [["Image-1: <image>\nImage-2: <image>\nImage-3: <image>\nImage-4: <image>\nImage-5: <image>\n问题2", "回答2"], ["Image-1: <image>\nImage-2: <image>\nImage-3: <image>\nImage-4: <image>\nImage-5: <image>\n问题3", "回答3"], ["Image-1: <image>\nImage-2: <image>\nImage-3: <image>\nImage-4: <image>\nImage-5: <image>\n问题4", "回答4"]], "images": ["1.jpg", "2.jpg", "3.jpg", "4.jpg", "5.jpg"]}
...
四个问题,五张图片的微调格式
微调报错如下:
...
[INFO:swift] PeftModelForCausalLM: 8094.3708M Params (19.0054M Trainable [0.2348%]), 268.4375M Buffers.
[INFO:swift] system: 你是由上海人工智能实验室联合商汤科技开发的书生多模态大模型,英文名叫InternVL, 是一个有用无害的人工智能助手。
[INFO:swift] args.lazy_tokenize: True
100%|████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 79638.68it/s]
[INFO:swift] train_dataset: Dataset({
features: ['query', 'response', 'images', 'history'],
num_rows: 2
})
[INFO:swift] val_dataset: Dataset({
features: ['query', 'response', 'images', 'history'],
num_rows: 1
})
Traceback (most recent call last):
File "/data/newest/swift-newest/swift/cli/sft.py", line 5, in <module>
sft_main()
File "/data/newest/swift-newest/swift/utils/run_utils.py", line 32, in x_main
result = llm_x(args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/data/newest/swift-newest/swift/llm/sft.py", line 528, in llm_sft
train_dataset, val_dataset = prepare_dataset(args, template, msg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/newest/swift-newest/swift/llm/sft.py", line 403, in prepare_dataset
td0, tkwargs0 = template.encode(train_dataset[0])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/newest/swift-newest/swift/llm/utils/template.py", line 545, in encode
example = self.preprocess(example)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/newest/swift-newest/swift/llm/utils/template.py", line 540, in preprocess
self._preprocess_media(example)
File "/data/newest/swift-newest/swift/llm/utils/template.py", line 463, in _preprocess_media
self.add_default_tags(example)
File "/data/newest/swift-newest/swift/llm/utils/template.py", line 435, in add_default_tags
assert num_new_tags >= 0, f'Number of media: {num_media}, number of media_tags: {num_media_tags}'
^^^^^^^^^^^^^^^^^
AssertionError: Number of media: 5, number of media_tags: 20
微调的数据是一段连续的对话,所有的问题都会基于五张图片回答的
所以在 query 与 history 里面都加上了 Image-1: <image>\nImage-2: <image>\nImage-3: <image>\nImage-4: <image>\nImage-5: <image>\n
总的 media_tags 就是 4x5,一共 20 个,与实际获取的 media 数量 5 个不符合
预想的结果是,每个问题都带有 tag,会从 images 取出所有的图片
实际的结果似乎是,它统计所有的 tag,然后从 images 取图片,然后就会出现这个问题 AssertionError: Number of media: 5, number of media_tags: 20
如果把 history 中所有的图片 tag 去除,就可以正常微调,那么 history 中的问题与回答还会基于这五张图片吗
Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
- swift 版本:最新版
- GPU 学习:
NVIDIA-SMI 550.54.14, Driver Version: 550.54.14, CUDA Version: 12.4,NVIDIA GeForce RTX 4090 24G
Additional context Add any other context about the problem here(在这里补充其他信息)
请重新拉一下最新的代码