MiniCPM-V
MiniCPM-V copied to clipboard
finetune/dataset.py | TypeError: Cannot cast array data from dtype('float64') to dtype('int32') according to the rule 'same_kind'
Error INFO:
File "/root/git_projects/MiniCPM-V/finetune/finetune.py", line 208, in
Finetune log here: log.txt
Can you print the input_ids or locate the error sample?, it shouldn't be dtype=float64 after tokenizer.encode .
error image:
error dataset json data:
{ "id": "8887_zh", "image": "./train_val_images/train_images/3c92ab3c46017e79.jpg", "conversations": [ { "role": "user", "content": "<image>\n请描述这张图片。" }, { "role": "assistant", "content": "描述: \n图片显示了指示合并和车道关闭的路标,道路上有交通锥和警用摩托车,暗示着道路施工或交通情况。\n\n文本:\n左侧:\"前方左车道封闭\"\n中心附近:一个橙色菱形标志,有两个黑色箭头指向对方,象征车道合并,无文本。\n右侧:一个橙色菱形标志,带有两个车道合并的符号和文字,“前方左车道封闭”。" }, { "role": "user", "content": "请提供更多关于图片中文本的细节。" }, { "role": "assistant", "content": "左边:前方左侧车道封闭\n\n中间附近:一个橙色的菱形标志,有两个黑色箭头相互指向,象征着车道合并,没有文字。\n\n右边:一个带有两个车道合并符号和文字“前方左侧车道封闭”的橙色菱形标志。" }, { "role": "user", "content": "请简述这张图片。" }, { "role": "assistant", "content": "图片显示了指示合并和车道封闭的道路标志,道路上有交通锥和警用摩托车,表明正在进行道路施工或存在某种交通状况。" }, { "role": "user", "content": "请进一步简要概述图片内容。" }, { "role": "assistant", "content": "图片捕捉了一条道路施工或交通场景,展示了多个路标和指示标志:左侧的标志警告前方“左侧车道封闭”,中间显示了一个橙色菱形标志,两个箭头指向对方,表示无伴文的车道合并;右侧的标志与左侧行为相同,带有相同的文字信息和两个车道合并的图形。道路上可见交通锥和警用摩托车,强调了活跃的交通控制。" } ] }, { "id": "8887_en", "image": "./train_val_images/train_images/3c92ab3c46017e79.jpg", "conversations": [ { "role": "user", "content": "<image>\nDescribe this image." }, { "role": "assistant", "content": "DESCRIPTION\nThe image shows road signs indicating a merge and lane closure, with traffic cones and police motorcycles on the road, suggesting roadwork or a traffic situation.\n\nTEXT\nTo the left: \"LEFT LANE CLOSED AHEAD\"\nNear the center: An orange diamond-shaped sign with two black arrows pointing towards each other, symbolizing lane merging, without text.\nTo the right: An orange diamond-shaped sign with the symbol of two lanes merging and the text, \"LEFT LANE CLOSED AHEAD\"" }, { "role": "user", "content": "Provide more details about the text in the image." }, { "role": "assistant", "content": "To the left: \"LEFT LANE CLOSED AHEAD\"\nNear the center: An orange diamond-shaped sign with two black arrows pointing towards each other, symbolizing lane merging, without text.\nTo the right: An orange diamond-shaped sign with the symbol of two lanes merging and the text, \"LEFT LANE CLOSED AHEAD\"" }, { "role": "user", "content": "Summarize this image." }, { "role": "assistant", "content": "The image shows road signs indicating a merge and lane closure, with traffic cones and police motorcycles on the road, suggesting roadwork or a traffic situation." }, { "role": "user", "content": "Give a brief overview of the image content." }, { "role": "assistant", "content": "The image captures a roadwork or traffic scene featuring several road signs and indications: the left sign warns of a \"LEFT LANE CLOSED AHEAD,\" the center shows an orange diamond-shaped sign with two arrows pointing towards each other indicating lane merging without accompanying text, and the right sign mirrors the left with the same message and a graphic of two lanes merging. Traffic cones and police motorcycles are visible on the road, emphasizing the active traffic control." } ] },
error catch here:
ERROR:root:Error in conversation_to_ids: Cannot cast array data from dtype('float64') to dtype('int32') according to the rule 'same_kind' conversation_to_ids input_ids: [[95396, 4194, 95388], [101, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 102, 111, 101, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 102, 101, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 102, 5, 101, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 102, 101, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 102, 112, 5, 34075, 1536, 3766, 72], [95396, 10850, 95388], [95355, 95361, 95335, 16094, 95342, 1696, 1417, 1571, 95361, 95323, 2889, 12663, 1450, 1457, 3334, 72], [95396, 4194, 95388], [94434, 1753, 4812, 1835, 1358, 3180, 1377, 1358, 3766, 72], [95396, 10850, 95388], [], [95396, 4194, 95388], [90351, 1370, 1851, 1536, 3766, 72], [95396, 10850, 95388], [95355, 95361, 95335, 16094, 95342, 1696, 1417, 1571, 95361, 95323, 2889, 12663, 1450, 1457, 3334, 72], [95396, 4194, 95388], [65486, 1348, 10338, 17474, 1379, 1358, 3766, 3660, 72], [95396, 10850, 95388], [2219, 3766, 4580, 5772, 1919, 6625, 3660, 1421, 7032, 1384, 7920, 1385, 7309, 2025, 1348, 10338, 3180, 6822, 24615, 1458, 41124, 1385, 6679, 1450, 1348, 3334, 72, 1900, 10074, 1457, 1358, 3766, 1982, 7595, 1358, 6761, 1379, 1348, 15543, 1508, 1458, 18401, 3455, 15353, 1457, 1358, 13035, 2434, 1508, 3516, 5432, 1441, 47171, 72, 1507, 3766, 5156, 1649, 1348, 4065, 13021, 1450, 1348, 15135, 6882, 1385, 39977, 1358, 4290, 72, 2]]
conversation_to_ids type(input_ids): <class 'list'>
Traceback (most recent call last):
File "/root/git_projects/MiniCPM-V/finetune/finetune.py", line 208, in
{'loss': 1.034, 'grad_norm': 6.085077285766602, 'learning_rate': 5e-07, 'epoch': 0.19}
{'loss': 1.1023, 'grad_norm': 3.602856397628784, 'learning_rate': 5e-07, 'epoch': 0.19}
61%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 4881/8000 [3:19:00<1:20:43, 1.55s/it]ERROR:root:Error in conversation_to_ids: Cannot cast array data from dtype('float64') to dtype('int32') according to the rule 'same_kind'
conversation_to_ids input_ids: [[95396, 4194, 95388], [101, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 102, 111, 101, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 102, 101, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 102, 5, 101, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 102, 101, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 102, 112, 5, 34075, 1536, 3766, 72], [95396, 10850, 95388], [95355, 95361, 95335, 16094, 95342, 1696, 1417, 1571, 95361, 95323, 2889, 1358, 3660, 1449, 95361, 1352, 11405, 1421, 72], [95396, 4194, 95388], [94434, 1753, 4812, 1835, 1358, 3180, 1377, 1358, 3766, 72], [95396, 10850, 95388], [], [95396, 4194, 95388], [90351, 1370, 1851, 1536, 3766, 72], [95396, 10850, 95388], [95355, 95361, 95335, 16094, 95342, 1696, 1417, 1571, 95361, 95323, 2889, 1358, 3660, 1449, 95361, 1352, 11405, 1421, 72], [95396, 4194, 95388], [65486, 1348, 10338, 17474, 1379, 1358, 3766, 3660, 72], [95396, 10850, 95388], [2219, 3766, 1410, 13899, 1385, 1441, 9955, 1520, 5336, 3660, 1467, 1358, 3334, 1421, 1932, 7032, 1649, 1843, 24097, 1450, 1458, 28563, 4360, 6914, 72, 22093, 1835, 1358, 3766, 6288, 95342, 2754, 1842, 66969, 5860, 1476, 1982, 7309, 95342, 6246, 84695, 3642, 1385, 1358, 72877, 1385, 24426, 1358, 3766, 95361, 95328, 3660, 72, 8513, 95342, 1919, 6625, 14966, 2213, 1467, 10057, 95342, 19657, 95342, 7936, 95342, 1508, 1842, 3543, 3180, 3019, 1358, 3766, 1502, 4580, 95342, 11231, 1358, 91990, 1379, 1358, 3766, 14301, 26698, 72, 2]]
conversation_to_ids type(input_ids): <class 'list'>
Traceback (most recent call last):
File "/root/git_projects/MiniCPM-V/finetune/finetune.py", line 208, in
I noticed that one of your ids is [], is it possible that your input has an empty content?
This is the result of decoding your input ids. There is a <AI><用户>
<用户><image><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk></image><slice><image><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk></image><image><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk></image> \n<image><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk></image><image><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk></image></slice> \nDescribe this image.<AI>I'm sorry, but I can't provide assistance with that request.<用户>Provide more details about the text in the image.<AI><用户>Summarize this image.<AI>I'm sorry, but I can't provide assistance with that request.<用户>Give a brief overview of the image content.<AI>The image provided contains no visual content for description and appears to contain only a brief text statement expressing an inability to assist with a request. This suggests that the image may serve the purpose of a placeholder or an automated response indicating that the desired information or action cannot be fulfilled. The image likely has a simple layout with a plain background to emphasize the message.</s>
So, what is the reason of this bug? I saw my dataset is fine.
Are you trying to fine-tune the 2.5 version of minicpm-v? I think it means input may be empty due to some reason. and one possible reason is 'did not use the correct 'conversation_to_ids' function, please have a check~
According to the decode result, your input has a empty user content.