YuzaChongyi comments

Results 52 comments of


                                            YuzaChongyi

object detection能力

你好，感谢建议，minicpm-v-2 在训练的时候 general grounding 的数据比较少，并且 sft 阶段也没有专门加入 grounding 数据来增强模型的定位能力，所以当前开源模型不太能支持输入指令让模型返回目标 bbox，我们会考虑在后续的迭代中加上这个能力。

Implement chatbot functionality using Streamlit

Thanks for your contribution. There are several questions: - It seems that multi-turn conversation is not implemented, because I noticed that the answers are not added to history msgs? -...

finetune/dataset.py | TypeError: Cannot cast array data from dtype('float64') to dtype('int32') according to the rule 'same_kind'

Can you print the `input_ids` or locate the error sample?, it shouldn't be dtype=float64 after `tokenizer.encode` .

finetune/dataset.py | TypeError: Cannot cast array data from dtype('float64') to dtype('int32') according to the rule 'same_kind'

I noticed that one of your ids is `[]`, is it possible that your input has an empty content?

finetune/dataset.py | TypeError: Cannot cast array data from dtype('float64') to dtype('int32') according to the rule 'same_kind'

This is the result of decoding your input ids. There is a `` > ``` \n \nDescribe this image.I'm sorry, but I can't provide assistance with that request.Provide more details...

finetune/dataset.py | TypeError: Cannot cast array data from dtype('float64') to dtype('int32') according to the rule 'same_kind'

According to the decode result, your input has a empty user content.

VisionEncoder里面的vit，用的是idefics2还是 hf4m那个啊？

> 直接从idefics2加载的ve权重？ > […](#) > ---- 回复的原邮件 ---- | 发件人 | Hongji ***@***.***> | | 日期 | 2024年05月20日 19:27 | | 收件人 | ***@***.***> | | 抄送至 | ***@***.***>***@***.***> |...

VisionEncoder里面的vit，用的是idefics2还是 hf4m那个啊？

> Same as https://huggingface.co/HuggingFaceM4/siglip-so400m-14-384-flash-attn2 with two changes: > > increase max resolution to 980 x 980 (instead of 384 x 384) by interpolating the position embeddings > implement the strategy...

YuzaChongyi

object detection能力

Implement chatbot functionality using Streamlit

微调报错

微调报错

finetune/dataset.py | TypeError: Cannot cast array data from dtype('float64') to dtype('int32') according to the rule 'same_kind'

finetune/dataset.py | TypeError: Cannot cast array data from dtype('float64') to dtype('int32') according to the rule 'same_kind'

finetune/dataset.py | TypeError: Cannot cast array data from dtype('float64') to dtype('int32') according to the rule 'same_kind'

finetune/dataset.py | TypeError: Cannot cast array data from dtype('float64') to dtype('int32') according to the rule 'same_kind'

VisionEncoder里面的vit，用的是idefics2还是 hf4m那个啊？

VisionEncoder里面的vit，用的是idefics2还是 hf4m那个啊？