MiniCPM-V 实时多模态交互代码demo部分源码大概什么时候可以开源呢？可以说一下思路吗？

实时多模态交互代码demo部分源码大概什么时候可以开源呢？可以说一下思路吗？

Open Roki7274 opened this issue 1 year ago • 3 comments

请问一下演示的demo部分的代码大概什么时候开源呢？可以先说一下框架和实现的思路吗？

Apr 17 '24 08:04 Roki7274

Web demo代码已开放，enjoy

Apr 17 '24 13:04 iceflame89

Web demo代码已开放，enjoy

实时多模态交互代码思路可以说一下吗？😋我试着输出了视频每一帧的内容以及和上一帧的关系，但是效果不太好，gpt-3.5接口需要输入什么内容才能更好的推理视频的内容呢

Apr 18 '24 02:04 Roki7274

感谢关注。这个实时交互的demo整体是一个pipeline系统，由ChatGPT API整合多帧描述，效果确实不太稳定。为了尽量缓解不稳定效果，我们实现的框架思路中大致如下:

尽量让OmniLMM/MiniCPM-V对每帧描述的简短一些，防止过多的无关内容，我们用的prompt是"What is happening in the image?"。
每次回答问题后加了一个历史总结功能来防止上下文过长，使用的prompt是：

Before coming to the next round, please make summarization. You need to first illustrate the overall event. For example, "Event: you are writing code". Then, you need summarize the previous round. For example, "Last Round: you write a python function for string matching".

ChatGPT输入：总结后的上下文信息+当前窗口下的多帧描述+当前问题。

代码我们会尽量在5月份整理放出。

Apr 22 '24 16:04 waxnkw

MiniCPM-V MiniCPM-V copied to clipboard

实时多模态交互代码demo部分源码大概什么时候可以开源呢？可以说一下思路吗？

MiniCPM-V
MiniCPM-V copied to clipboard