llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Get chat_template from a server endpoint.

Open lastrosade opened this issue 4 months ago • 9 comments

Feature Description

Retrieve the "chat_template" field from the GGUF model in the /props endpoint.

Motivation

Many models incorporate a jinja2 template stored in a field called chat_template. This feature would enable users to generate appropriate templates with their scripts.

This could also probably be used on the web UI to autofill the template text boxes.

lastrosade avatar Feb 10 '24 23:02 lastrosade

Could be nice to just return all the gguf metadata in one go?

Azeirah avatar Feb 11 '24 20:02 Azeirah

I need this too. Currently, the problem is that we cannot access to metadata outside of llama_model_loader (please correct me if I'm wrong)

ngxson avatar Feb 14 '24 11:02 ngxson

There are functions in the llama.h API to read the metadata. It should work with any non-array metadata.

https://github.com/ggerganov/llama.cpp/blob/8084d554406b767d36b3250b3b787462d5dd626f/llama.h#L357-L367

slaren avatar Feb 14 '24 11:02 slaren

@slaren Perfect, thanks. That's exactly what I was missing in https://github.com/ggerganov/llama.cpp/pull/5425

I'm not sure how can we decode the template inside cpp code. It would be far more complicated to include some kind of "official" parser.

The idea that I'm having in my mind is maybe hard code some template patterns to detect if it's which type of template. In reality, we will mostly have either llama2 format ([INST]) or chatml (<|im_start|>)

ngxson avatar Feb 15 '24 18:02 ngxson

The idea that I'm having in my mind is maybe hard code some template patterns to detect if it's which type of template. In reality, we will mostly have either llama2 format ([INST]) or chatml (<|im_start|>)

Yes, exactly. Some simple heuristic checks to detect the most common templates would be great. Should be something very basic and easy to reuse - no need to over-engineer it.

ggerganov avatar Feb 16 '24 09:02 ggerganov

Would that work for weirder templates like MiniCPM's

<用户>
<AI>

?

lastrosade avatar Feb 21 '24 18:02 lastrosade

Would that work for weirder templates like MiniCPM's

<用户>
<AI>

?

No, not for now, but we can add support for these template as long as we can find the jinja version.

I couldn't find this template in tokenizer_config.json of MiniCPM-V. Can you find it somewhere?

For now, we can only support templates that are included in tokenizer_config.json. The benefit is that I can run the python code then cpp code to compare if the cpp implementation is correct or not.

ngxson avatar Feb 21 '24 19:02 ngxson

I couldn't find this template in tokenizer_config.json of MiniCPM-V. Can you find it somewhere?

I took mine from here https://github.com/ggerganov/llama.cpp/issues/5447#issuecomment-1957784407

{% for message in messages %}{% if message['role'] == 'user' %}{{'<用户>' + message['content'].strip() + '<AI>'}}{% else %}{{message['content'].strip()}}{% endif %}{% endfor %}

lastrosade avatar Feb 22 '24 05:02 lastrosade

@lastrosade Can you give the link to the official docs somewhere? Pay attention because template may different structure of newline & space & EOS / BOS token that is quite confused.

Your template will output something like: <用户>hello<AI>hi

But in reality, it may be: <用户>hello\n<AI>hi, <用户>\nhello\n</s><s><AI>\nhi,...

That's why it's always better to have the official template (the one used in training process)

ngxson avatar Feb 22 '24 09:02 ngxson

I don't know where to find any official docs, But looking at their repo, it seems that they do not use any special tokens in their template.

https://github.com/OpenBMB/MiniCPM/blob/b3358343cb6cc40002d92bc382ab92b98d5b8f3e/model/modeling_minicpm.py#L1326 But I think this only parses text, so idk.

lastrosade avatar Feb 23 '24 19:02 lastrosade

@lastrosade sorry for the late response, but the current blocking point is that the gguf model does not have template at all, so it's impossible for server to detect if it should use MiniCPM template or not.

Please join the discussion in the linked issue above

ngxson avatar Mar 07 '24 13:03 ngxson

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Apr 21 '24 01:04 github-actions[bot]