moondream
moondream copied to clipboard
Try openchat LLM instead of Phi1.5
Its a truly amazing model but Hi I was disappointed by the instructability of the phi1.5 element of your model. For instance, if asked to ignore a particular object in a photo it doesn't follow this and also when asked to write a certain number of words doesn't do this reliably. when asked to start a new paragraph does not do this. Doesnt respond well to text later in the prompt.
I recently reviewed a lot of models looking for one for a chatbot.
Nous-Hermes-2-SOLAR-10.7B https://huggingface.co/NousResearch/Nous-Hermes-2-SOLAR-10.7B
OpenChat3.5(0106) https://huggingface.co/openchat/openchat-3.5-0106 https://github.com/imoneoi/openchat
are much more instructible and urge you to look at these for connecting your visual encoder.
Their large memory requirement compared to phi might defeat the purpose.
@axrwl quantised latest openchat takes only 4Gb https://huggingface.co/openchat/openchat-3.5-0106 main problem is the only working quantised versions for vision llms I've seen is bitsandbytes transformers library does mot seem to support gptq in its image-to-text pipeline despite supporting it in its main llm inference