candle icon indicating copy to clipboard operation
candle copied to clipboard

how to use system prompt with the llama example?

Open evilsocket opened this issue 1 year ago • 3 comments

Hi, I'm trying to pass a chat dialog in the LLama3 format to the llama example via -prompt, the string is as follows:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
    
You are a helpful AI assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>
    
Why is the sky blue?<|eot_id|><|start_header_id|>assistant<|end_header_id|>


This seems to confuse the model and, depending on the user prompt can cause the model to generate gibberish characters (see also https://github.com/evilsocket/cake/issues/9):

(i've made a small change to load the prompt from a file if passed with @)

/path/to/compiled/llama3/example --model-id "meta-llama/Meta-Llama-3-8B" --prompt @hf-llama-test/prompt.txt


loading the model weights from meta-llama/Meta-Llama-3-8B
loading prompt from @hf-llama-test/prompt.txt ...
starting the inference loop
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
    
You are a helpful AI assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>
    
Why is the sky blue?<|eot_id|><|start_header_id|>assistant<|end_header_id|>



By: David Cope (2022, October 23)


14 tokens generated (16.831015425660595 token/s)

evilsocket avatar Jul 18 '24 10:07 evilsocket

Related to the conversation here https://github.com/huggingface/candle/issues/1177#issuecomment-2067273391 ... it seems a problem with the tokenizer, in fact in this loop only resolved tokens are printed, but if you also add a check for None, you'll see a lot of tokens (that can be found in the vocab section of the tokenizer.json file) that should be resolved but aren't.

evilsocket avatar Jul 18 '24 12:07 evilsocket

The idea is that non-resolved tokens are actually accumulated, the decoder (TokenOutputStream) is stateful as decoding some tokens can only be done when knowing the following tokens so it's expected that on some tokens None will be returned but the actual output should be printed later when the tokenizer is able to flush the output.

LaurentMazare avatar Jul 18 '24 14:07 LaurentMazare

thank you @LaurentMazare for the info ... the model is still confused by the structured prompt tho

evilsocket avatar Jul 18 '24 14:07 evilsocket