llamafile
llamafile copied to clipboard
llava not using correct system prompt and/or settings
When I launch the current llava-v1.5-7b-q4-server.llamafile
, I see a system prompt and default settings that differ from what llava uses for training and inference.
Specifically, I believe the default prompt for llava-v1.5 is A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.
with a user name of USER
and a bot name of ASSISTANT
.
Additionally I noticed settings also seem to differ from what the llava official demo is using - for example temperature is 0.7 instead of 0.2, Top-P is 0.5 instead of 0.7, etc. These are probably not as important as updating the system prompt, but I just thought i would mention it as something to check.
I believe this goes for every model, and it is clearly a bit problematic. That is, the prompt template (and "history template") of the "chat interfaces" with the different models are evidently rather different from model to model. There's those with [INST] .. [/INST] and <<SYS>><</SYS>>, with the start and stop tokens. Then there are the simple User:, Assistant: - where sometimes those names are important, while for others not?
That GUI is llama.cpp's "server" GUI, and as such this is not llamafile's fault. But it would have been great if any of those projects managed to get a way where the ~model file itself (the GGUF file!)~ (edit!) the tokenizer could explain its chat structure, so that user interfaces could adhere. (And, yes, that is what's talked about in #65).
Also, those {{prompt}} and {{history}} etc variables in the template HTML fields are explained exactly zero places on the internet, according to my Google skills.
And as @dribnet also points out, the parameters/settings seemingly also have different "defaults" which gives good results - which obviously also should have been embedded in the meta of the GGUF files.
Well, one can dream! This field is moving extremely fast.
Oh, I guess this is exactly what https://github.com/Mozilla-Ocho/llamafile/issues/65 is about.
Pointing to this blogpost: https://huggingface.co/blog/chat-templates
@jart
I have a question for you that you are running this, just to see how crazy am I, but, is there anywhere online where one can read what the flying heck does the template placeholders do?
Also, those {{prompt}} and {{history}} etc variables in the template HTML fields are explained exactly zero places on the internet, according to my Google skills.
I have the same issue, couldn't find it even if my life depended on it.
in newer GGUFs I sometimes see the tokenizer chat template. Can these be used automatically?