MLLM support
Does this project support the training and inference of multi-modal retrieval models, such as Phi-3-vision? I'd like to reproduce the experiments in paper https://arxiv.org/abs/2406.11251 based on this project.
Thanks for your interest @ChingKwanCheung. I will merge the code and doc this weekend.
Hi @ChingKwanCheung, I have added the code and a initial doc in https://github.com/texttron/tevatron/tree/main/examples/dse
Hi @ChingKwanCheung, I have added the code and a initial doc in https://github.com/texttron/tevatron/tree/main/examples/dse
Thank you!This paper is a really good job. I have tested the multi-modal retrieval model(https://huggingface.co/Tevatron/dse-phi3-docmatix-v1) you released before and found that the English retrieval capability is excellent. If I want to enhance its Chinese retrieval capability, is it recommended to continue training with Chinese data based on this model?
Thanks @ChingKwanCheung , I guess the Chinese capability largely depends on the LLM's capability on Chinese and also how the Visual encoder aligns with the language model. I am not very sure if Phi3 do the things well on Chinese. I feel https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5 might be a good choice of backbone for Chinese tasks.