tevatron MLLM support

Does this project support the training and inference of multi-modal retrieval models, such as Phi-3-vision? I'd like to reproduce the experiments in paper https://arxiv.org/abs/2406.11251 based on this project.

Aug 02 '24 01:08 ChingKwanCheung

Thanks for your interest @ChingKwanCheung. I will merge the code and doc this weekend.

Aug 02 '24 07:08 MXueguang

Hi @ChingKwanCheung, I have added the code and a initial doc in https://github.com/texttron/tevatron/tree/main/examples/dse

Aug 05 '24 05:08 MXueguang

Hi @ChingKwanCheung, I have added the code and a initial doc in https://github.com/texttron/tevatron/tree/main/examples/dse

Thank you！This paper is a really good job. I have tested the multi-modal retrieval model(https://huggingface.co/Tevatron/dse-phi3-docmatix-v1) you released before and found that the English retrieval capability is excellent. If I want to enhance its Chinese retrieval capability, is it recommended to continue training with Chinese data based on this model?

Aug 05 '24 07:08 ChingKwanCheung

Thanks @ChingKwanCheung , I guess the Chinese capability largely depends on the LLM's capability on Chinese and also how the Visual encoder aligns with the language model. I am not very sure if Phi3 do the things well on Chinese. I feel https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5 might be a good choice of backbone for Chinese tasks.

Aug 12 '24 07:08 MXueguang