InternVL icon indicating copy to clipboard operation
InternVL copied to clipboard

[Feature] Integration Of Custom Vision-Model without altering modeling_internvl_chat.py

Open hamza-dev-12 opened this issue 1 year ago • 2 comments

Motivation

Task specific vision-models would perform better in that task rather than general purpose vision-model. So it would be better if we can simple pass our vision_model in the internvl model and it extracts the configuration and adjust every thing dynamically.

it would be great if we can integrate our custom vision-model with much changes. Still for now can anyone tell what changes would be required in order to integrate custom vision model.

Related resources

Currently I am not really aware of that but I think there is VisionEncoderDecoder library provided by hugging face which integrates Vit and LLM.

Additional context

No response

hamza-dev-12 avatar Sep 19 '24 13:09 hamza-dev-12

To replace the visual model within the InternVL2 framework, it is necessary to modify the self.vision_model attribute within the models_internvl_chat.py script. Subsequent to this modification, retraining of the projection layer is required. At present, there is no automated or simplified method available for this model substitution.

qishisuren123 avatar Sep 21 '24 08:09 qishisuren123

To replace the visual model within the InternVL2 framework, it is necessary to modify the self.vision_model attribute within the models_internvl_chat.py script. Subsequent to this modification, retraining of the projection layer is required. At present, there is no automated or simplified method available for this model substitution.

Hello, could you please tell me how to retrain the MLP? Which commands and files do I need to run?

20191864218 avatar Oct 02 '24 03:10 20191864218