candle
candle copied to clipboard
how to use system prompt with the llama example?
Hi, I'm trying to pass a chat dialog in the LLama3 format to the llama example via -prompt, the string is as follows:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful AI assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>
Why is the sky blue?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
This seems to confuse the model and, depending on the user prompt can cause the model to generate gibberish characters (see also https://github.com/evilsocket/cake/issues/9):
(i've made a small change to load the prompt from a file if passed with @)
/path/to/compiled/llama3/example --model-id "meta-llama/Meta-Llama-3-8B" --prompt @hf-llama-test/prompt.txt
loading the model weights from meta-llama/Meta-Llama-3-8B
loading prompt from @hf-llama-test/prompt.txt ...
starting the inference loop
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful AI assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>
Why is the sky blue?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
By: David Cope (2022, October 23)
14 tokens generated (16.831015425660595 token/s)
Related to the conversation here https://github.com/huggingface/candle/issues/1177#issuecomment-2067273391 ... it seems a problem with the tokenizer, in fact in this loop only resolved tokens are printed, but if you also add a check for None, you'll see a lot of tokens (that can be found in the vocab section of the tokenizer.json file) that should be resolved but aren't.
The idea is that non-resolved tokens are actually accumulated, the decoder (TokenOutputStream) is stateful as decoding some tokens can only be done when knowing the following tokens so it's expected that on some tokens None will be returned but the actual output should be printed later when the tokenizer is able to flush the output.
thank you @LaurentMazare for the info ... the model is still confused by the structured prompt tho