It won't roast me. Remove censorship.
If I ask it to roast me (my picture), it just says "no". Even Twitter's Grok allows roasting.
Sounds annoying, what was the prompt? "Say something offensive about the image" worked for me FWIW
"Roast me" It kept saying "No" even when I tried different face images I found on Google. Interestingly, your prompt does work. It's not to a level of roasting like Grok, and it tends to be somewhat diplomatic, but at least it's not as censored as I thought.
By the way, I'm fairly impressed with how capable Moondream is. I tried testing some basic memes on MoE LLaVA at https://huggingface.co/spaces/LanguageBind/MoE-LLaVA and it could not read the text as well as Moondream. And Moondream seems to answer the man ironing behind a taxi better than MoE LLaVA.
If I had to guess it's probably never seen the phrase "roast me" during the training process... the text model (phi-1.5) was trained mostly on synthetic data from ChatGPT so that's where a lot of the cautious behavior is coming from, and it's definitely not in the training data I used to add vision capabilities. Would probably work better with a model like StableVLM but I don't like the license on that one.
And thank you! I put a lot of work into making it actually work instead of benchmark hacking. This type of feedback is super useful so I know what types of data to add in future training runs.