llamafile icon indicating copy to clipboard operation
llamafile copied to clipboard

llava not using correct system prompt and/or settings

Open dribnet opened this issue 1 year ago • 4 comments

When I launch the current llava-v1.5-7b-q4-server.llamafile, I see a system prompt and default settings that differ from what llava uses for training and inference.

Screenshot 2023-12-04 at 12 50 59 AM

Specifically, I believe the default prompt for llava-v1.5 is A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. with a user name of USER and a bot name of ASSISTANT.

Additionally I noticed settings also seem to differ from what the llava official demo is using - for example temperature is 0.7 instead of 0.2, Top-P is 0.5 instead of 0.7, etc. These are probably not as important as updating the system prompt, but I just thought i would mention it as something to check.

dribnet avatar Dec 03 '23 12:12 dribnet

I believe this goes for every model, and it is clearly a bit problematic. That is, the prompt template (and "history template") of the "chat interfaces" with the different models are evidently rather different from model to model. There's those with [INST] .. [/INST] and <<SYS>><</SYS>>, with the start and stop tokens. Then there are the simple User:, Assistant: - where sometimes those names are important, while for others not?

That GUI is llama.cpp's "server" GUI, and as such this is not llamafile's fault. But it would have been great if any of those projects managed to get a way where the ~model file itself (the GGUF file!)~ (edit!) the tokenizer could explain its chat structure, so that user interfaces could adhere. (And, yes, that is what's talked about in #65).

Also, those {{prompt}} and {{history}} etc variables in the template HTML fields are explained exactly zero places on the internet, according to my Google skills.

And as @dribnet also points out, the parameters/settings seemingly also have different "defaults" which gives good results - which obviously also should have been embedded in the meta of the GGUF files.

Well, one can dream! This field is moving extremely fast.

stolsvik avatar Dec 29 '23 22:12 stolsvik

Oh, I guess this is exactly what https://github.com/Mozilla-Ocho/llamafile/issues/65 is about.

Pointing to this blogpost: https://huggingface.co/blog/chat-templates

stolsvik avatar Dec 29 '23 22:12 stolsvik

@jart
I have a question for you that you are running this, just to see how crazy am I, but, is there anywhere online where one can read what the flying heck does the template placeholders do?

Also, those {{prompt}} and {{history}} etc variables in the template HTML fields are explained exactly zero places on the internet, according to my Google skills.

I have the same issue, couldn't find it even if my life depended on it.

frenchiveruti avatar Jan 07 '24 17:01 frenchiveruti

in newer GGUFs I sometimes see the tokenizer chat template. Can these be used automatically?

woheller69 avatar Apr 19 '24 10:04 woheller69