Why is only the text model being fine-tuned in the Finetuning.ipynb notebook?

Open AayushManShrestha opened this issue 11 months ago • 1 comments

I noticed that only the text_model of Moondream was being trained. Is there a reason for this?

What will training the entire Moondream or training only the vision_encoder result in?

Jan 04 '25 14:01 AayushManShrestha

For most tasks we see very little benefit from finetuning the vision encoder, and for some tasks we actually see worse performance. Unless the dataset is 100k+ images I would not recommend unlocking it.

Jan 10 '25 01:01 vikhyat