MiniCPM-V
MiniCPM-V copied to clipboard
💡 [REQUEST] - How to get internal embeddings for downstream retrieval tasks like ColPali
起始日期 | Start Date
No response
实现PR | Implementation PR
No response
相关Issues | Reference Issues
No response
摘要 | Summary
https://github.com/illuin-tech/colpali
Recently Colbert+PaliGemma showed big improvement on pdf file retrieval by using multimodal model instead of OCR+LLM. Would be nice if MiniCPM can support Colbert-like usage for downstream retrieval tasks. Or how can I finetune MiniCPM like ColPali?
基本示例 | Basic Example
NA
缺陷 | Drawbacks
NA
未解决问题 | Unresolved questions
No response
Yes, actually we have open-sourced MiniCPM-Visual-Embedding built upon MiniCPM-V-2.0, which is capable of:
-
Help you read a long visually-intensive or text-oriented PDF document and find the pages that answer your question.
-
Help you build a personal library and retireve book pages from a large collection of books.
-
It has only 2.8B parameters, and has the potential to run on your PC.
We open-sourced our visual embedding model at huggingface https://huggingface.co/RhapsodyAI/minicpm-visual-embedding-v0
You are welcomed to try our demo at https://huggingface.co/spaces/bokesyo/minicpm-visual-embeeding-v0-demo
We will open-source pdf visual embedding model based on MiniCPM-V-2.6 in about two weeks. If you want to spend some time finetune visual-embedding yourself, you are welcomed to refer our training framework github repo.
@bokesyo Do you have any plan for the multi vector representations? In Colpali report, multi vector showed much better performance than single vector
@bokesyo Also curious are you affiliated to OpenBMB? I saw the embedding model is under a different organization.
@bokesyo Do you have any plan for the multi vector representations? In Colpali report, multi vector showed much better performance than single vector
Not yet, currently we only use one vector to represent the page and easier to implement. 😂
@bokesyo Also curious are you affiliated to OpenBMB? I saw the embedding model is under a different organization.
Yes, but we think this is visual embedding technique is experimental and a preview version, so we did not put it on OpenBMB. When the quantitative evaluation result is ready and final model with all training data is ready, we will release on OpenBMB.
our training framework github repo This link is 404, can you tell me the correct link? https://github.com/RhapsodyAILab/minicpm-visual-embedding-v0
our training framework github repo This link is 404, can you tell me the correct link? https://github.com/RhapsodyAILab/minicpm-visual-embedding-v0
Yes, check our new repo: https://github.com/RhapsodyAILab/MiniCPM-V-Embedding-v0