YuzaChongyi
YuzaChongyi
你好,感谢建议,minicpm-v-2 在训练的时候 general grounding 的数据比较少,并且 sft 阶段也没有专门加入 grounding 数据来增强模型的定位能力,所以当前开源模型不太能支持输入指令让模型返回目标 bbox,我们会考虑在后续的迭代中加上这个能力。
Thanks for your contribution. There are several questions: - It seems that multi-turn conversation is not implemented, because I noticed that the answers are not added to history msgs? -...
你好,看起来是训练结果存储路径的问题,请核对一下保存地址
你试试 save_steps 设置的大一些呢?
Can you print the `input_ids` or locate the error sample?, it shouldn't be dtype=float64 after `tokenizer.encode` .
I noticed that one of your ids is `[]`, is it possible that your input has an empty content?
This is the result of decoding your input ids. There is a `` > ``` \n \nDescribe this image.I'm sorry, but I can't provide assistance with that request.Provide more details...
According to the decode result, your input has a empty user content.
> 直接从idefics2加载的ve权重? > […](#) > ---- 回复的原邮件 ---- | 发件人 | Hongji ***@***.***> | | 日期 | 2024年05月20日 19:27 | | 收件人 | ***@***.***> | | 抄送至 | ***@***.***>***@***.***> |...
> Same as https://huggingface.co/HuggingFaceM4/siglip-so400m-14-384-flash-attn2 with two changes: > > increase max resolution to 980 x 980 (instead of 384 x 384) by interpolating the position embeddings > implement the strategy...