InternVL [Feature] Integration Of Custom Vision-Model without altering modeling_internvl

Motivation

Task specific vision-models would perform better in that task rather than general purpose vision-model. So it would be better if we can simple pass our vision_model in the internvl model and it extracts the configuration and adjust every thing dynamically.

it would be great if we can integrate our custom vision-model with much changes. Still for now can anyone tell what changes would be required in order to integrate custom vision model.

Related resources

Currently I am not really aware of that but I think there is VisionEncoderDecoder library provided by hugging face which integrates Vit and LLM.

Additional context

No response

Sep 19 '24 13:09 hamza-dev-12

To replace the visual model within the InternVL2 framework, it is necessary to modify the self.vision_model attribute within the models_internvl_chat.py script. Subsequent to this modification, retraining of the projection layer is required. At present, there is no automated or simplified method available for this model substitution.

Sep 21 '24 08:09 qishisuren123

To replace the visual model within the InternVL2 framework, it is necessary to modify the self.vision_model attribute within the models_internvl_chat.py script. Subsequent to this modification, retraining of the projection layer is required. At present, there is no automated or simplified method available for this model substitution.

Hello, could you please tell me how to retrain the MLP? Which commands and files do I need to run?

Oct 02 '24 03:10 20191864218

[Feature] Integration Of Custom Vision-Model without altering modeling_internvl_chat.py

Motivation

Related resources

Additional context