[Docs] For intergrading
📚 The doc issue
Is there any tutor for integrating the vision model with the language model?
Suggest a potential alternative/fix
No response
Hello, are you referring to using a vision model and a language model to build an MLLM?
I'm impressed by InternVL and would like to have a tutorial/documentation on how you combine these models (vision model + MLP + LLMs) together so that they can be more accessible to newbies like me. Thank
@Hert4 Hi, do you have a quick "getting started" that allows one to run InternVL2_5 locally without relying on the Huggingface implementation?
Please refer to this script, which only requires a language model path and a vision encoder path.