moondream
moondream copied to clipboard
Why is only the text model being fine-tuned in the Finetuning.ipynb notebook?
I noticed that only the text_model of Moondream was being trained. Is there a reason for this?
What will training the entire Moondream or training only the vision_encoder result in?
For most tasks we see very little benefit from finetuning the vision encoder, and for some tasks we actually see worse performance. Unless the dataset is 100k+ images I would not recommend unlocking it.