MiniCPM-V icon indicating copy to clipboard operation
MiniCPM-V copied to clipboard

💡 [REQUEST] - How to get internal embeddings for downstream retrieval tasks like ColPali

Open VoVAllen opened this issue 1 year ago • 5 comments

起始日期 | Start Date

No response

实现PR | Implementation PR

No response

相关Issues | Reference Issues

No response

摘要 | Summary

https://github.com/illuin-tech/colpali

Recently Colbert+PaliGemma showed big improvement on pdf file retrieval by using multimodal model instead of OCR+LLM. Would be nice if MiniCPM can support Colbert-like usage for downstream retrieval tasks. Or how can I finetune MiniCPM like ColPali?

基本示例 | Basic Example

NA

缺陷 | Drawbacks

NA

未解决问题 | Unresolved questions

No response

VoVAllen avatar Aug 06 '24 18:08 VoVAllen

Yes, actually we have open-sourced MiniCPM-Visual-Embedding built upon MiniCPM-V-2.0, which is capable of:

  • Help you read a long visually-intensive or text-oriented PDF document and find the pages that answer your question.

  • Help you build a personal library and retireve book pages from a large collection of books.

  • It has only 2.8B parameters, and has the potential to run on your PC.

We open-sourced our visual embedding model at huggingface https://huggingface.co/RhapsodyAI/minicpm-visual-embedding-v0

You are welcomed to try our demo at https://huggingface.co/spaces/bokesyo/minicpm-visual-embeeding-v0-demo

We will open-source pdf visual embedding model based on MiniCPM-V-2.6 in about two weeks. If you want to spend some time finetune visual-embedding yourself, you are welcomed to refer our training framework github repo.

memex (2)

bokesyo avatar Aug 07 '24 07:08 bokesyo

@bokesyo Do you have any plan for the multi vector representations? In Colpali report, multi vector showed much better performance than single vector

VoVAllen avatar Aug 07 '24 07:08 VoVAllen

@bokesyo Also curious are you affiliated to OpenBMB? I saw the embedding model is under a different organization.

VoVAllen avatar Aug 07 '24 07:08 VoVAllen

@bokesyo Do you have any plan for the multi vector representations? In Colpali report, multi vector showed much better performance than single vector

Not yet, currently we only use one vector to represent the page and easier to implement. 😂

bokesyo avatar Aug 07 '24 08:08 bokesyo

@bokesyo Also curious are you affiliated to OpenBMB? I saw the embedding model is under a different organization.

Yes, but we think this is visual embedding technique is experimental and a preview version, so we did not put it on OpenBMB. When the quantitative evaluation result is ready and final model with all training data is ready, we will release on OpenBMB.

bokesyo avatar Aug 07 '24 08:08 bokesyo

our training framework github repo This link is 404, can you tell me the correct link? https://github.com/RhapsodyAILab/minicpm-visual-embedding-v0

lijiaoyang avatar Aug 16 '24 05:08 lijiaoyang

our training framework github repo This link is 404, can you tell me the correct link? https://github.com/RhapsodyAILab/minicpm-visual-embedding-v0

Yes, check our new repo: https://github.com/RhapsodyAILab/MiniCPM-V-Embedding-v0

bokesyo avatar Aug 16 '24 15:08 bokesyo