MiniCPM-V 💡 [REQUEST] - How to get internal embeddings for downstream retrieval tasks like ColPali

起始日期 | Start Date

No response

实现PR | Implementation PR

No response

摘要 | Summary

https://github.com/illuin-tech/colpali

Recently Colbert+PaliGemma showed big improvement on pdf file retrieval by using multimodal model instead of OCR+LLM. Would be nice if MiniCPM can support Colbert-like usage for downstream retrieval tasks. Or how can I finetune MiniCPM like ColPali?

基本示例 | Basic Example

NA

缺陷 | Drawbacks

NA

未解决问题 | Unresolved questions

No response

Aug 06 '24 18:08 VoVAllen

Yes, actually we have open-sourced MiniCPM-Visual-Embedding built upon MiniCPM-V-2.0, which is capable of:

Help you read a long visually-intensive or text-oriented PDF document and find the pages that answer your question.
Help you build a personal library and retireve book pages from a large collection of books.
It has only 2.8B parameters, and has the potential to run on your PC.

We open-sourced our visual embedding model at huggingface https://huggingface.co/RhapsodyAI/minicpm-visual-embedding-v0

You are welcomed to try our demo at https://huggingface.co/spaces/bokesyo/minicpm-visual-embeeding-v0-demo

We will open-source pdf visual embedding model based on MiniCPM-V-2.6 in about two weeks. If you want to spend some time finetune visual-embedding yourself, you are welcomed to refer our training framework github repo.

Aug 07 '24 07:08 bokesyo

@bokesyo Do you have any plan for the multi vector representations? In Colpali report, multi vector showed much better performance than single vector

Aug 07 '24 07:08 VoVAllen

@bokesyo Also curious are you affiliated to OpenBMB? I saw the embedding model is under a different organization.

Aug 07 '24 07:08 VoVAllen

@bokesyo Do you have any plan for the multi vector representations? In Colpali report, multi vector showed much better performance than single vector

Not yet, currently we only use one vector to represent the page and easier to implement. 😂

Aug 07 '24 08:08 bokesyo

@bokesyo Also curious are you affiliated to OpenBMB? I saw the embedding model is under a different organization.

Yes, but we think this is visual embedding technique is experimental and a preview version, so we did not put it on OpenBMB. When the quantitative evaluation result is ready and final model with all training data is ready, we will release on OpenBMB.

Aug 07 '24 08:08 bokesyo

our training framework github repo This link is 404, can you tell me the correct link? https://github.com/RhapsodyAILab/minicpm-visual-embedding-v0

Aug 16 '24 05:08 lijiaoyang

our training framework github repo This link is 404, can you tell me the correct link? https://github.com/RhapsodyAILab/minicpm-visual-embedding-v0

Yes, check our new repo: https://github.com/RhapsodyAILab/MiniCPM-V-Embedding-v0

Aug 16 '24 15:08 bokesyo

MiniCPM-V
MiniCPM-V copied to clipboard

💡 [REQUEST] - How to get internal embeddings for downstream retrieval tasks like ColPali

起始日期 | Start Date

实现PR | Implementation PR

相关Issues | Reference Issues

摘要 | Summary

基本示例 | Basic Example

缺陷 | Drawbacks

未解决问题 | Unresolved questions

MiniCPM-V MiniCPM-V copied to clipboard

💡 [REQUEST] - How to get internal embeddings for downstream retrieval tasks like ColPali

起始日期 | Start Date

实现PR | Implementation PR

相关Issues | Reference Issues

摘要 | Summary

基本示例 | Basic Example

缺陷 | Drawbacks

未解决问题 | Unresolved questions

MiniCPM-V
MiniCPM-V copied to clipboard