llama.cpp
llama.cpp copied to clipboard
Misc. bug: Problems with official jinja templates (Gemma 2, Llama 3.2, Qwen 2.5)
Name and Version
llama-cli --version version: 4713 (a4f011e8) built with MSVC 19.42.34436.0 for x64
Operating systems
Windows
Which llama.cpp modules do you know to be affected?
llama-server
Command line
1. llama-server -ngl 99 -m gemma-2-2b-it-Q8_0.gguf --jinja --chat-template-file gemma2.jinja -c 8192
2. llama-server -ngl 99 -m Llama-3.2-3B-Instruct-Q8_0.gguf --jinja --chat-template-file llama3.2.jinja -c 8192
3. llama-server -ngl 99 -m Qwen2.5-1.5B-Instruct-Q8_0.gguf --jinja --chat-template-file qwen2.5.jinja -c 8192
Problem description & steps to reproduce
Extracting official chat templates from chat_template field in tokenizer_config.json (Gemma 2, Llama 3.2, Qwen 2.5), storing them in files, and then trying to use them with llama-server results in errors.
- Gemma 2:
parse: error parsing grammar: expecting name atafter each message. - Llama 3.2: server doesn't start.
- Qwen 2.5:
parse: error parsing grammar: expecting name atafter each message.
@ochafik Could you look into this? It would be nice to have jinja implementation fully working with official templates, at least for major models.
First Bad Commit
No response
Relevant log output
Thanks for pointing out. I'm having the same error as well. I didn't use jinja template until llama.cpp supports tool calling, so didn't notice until I switch to tool calling.
Right now I'm trying to location which commit introduce the bug.
Hey @MoonRide303 , @henryclw , thanks for reporting this! Are you both experiencing this on Windows?
Could you try fetching the template with ./scripts/get_chat_template.py google/gemma-2-2b-it > gemma2.jinja ? (or probably with something like py script\get_chat_template.py google/gemma-2-2b-it > gemma2.jinja if not running inside a WSL shell)
(these templates seem to work on my mac, maybe some line ending issue or bad unescaping of the JSON string if editing them manually?)
@ochafik
My finding:
Since 4a2b196d , if you use llama.cpp with --jinja but doesn't provide the tools in the API call, would produce this error logs:
2025-02-14T22:43:34.104015028Z parse: error parsing grammar: expecting name at
2025-02-14T22:43:34.104022587Z
2025-02-14T22:43:34.104025971Z
2025-02-14T22:43:34.104161292Z slot launch_slot_: id 0 | task 2 | processing task
2025-02-14T22:43:34.104171597Z que start_loop: update slots
2025-02-14T22:43:34.104174354Z srv update_slots: posting NEXT_RESPONSE
But if you use llama.cpp with --jinja, and provide a tools in the API call, even with empty tools, there would be no error logs
curl http://localhost:8080/v1/chat/completions -d '{
"model": "gpt-3.5-turbo",
"tools": [ ],
"messages": [
{
"role": "user",
"content": "Print a hello world message with python."
}
]
}'
I'm not sure if the jinja template option must comes with the tools option, and what is the expected behavior and usage?
Hope this finding might be helpful. If you need any help please feel free to reply.
More detailed logs:
Without tools:
curl http://localhost:8080/v1/chat/completions -d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "Where is Vancouver?"
}
]
}'
Logs for API call without tools
2025-02-14T22:56:16.236296080Z }
2025-02-14T22:56:16.236298714Z [common_chat_params_init] has_tools=false
2025-02-14T22:56:16.236301194Z Prompt: <|im_start|>system
2025-02-14T22:56:16.236304013Z You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
2025-02-14T22:56:16.236306627Z <|im_start|>user
2025-02-14T22:56:16.236309096Z Where is Vancouver?<|im_end|>
2025-02-14T22:56:16.236311576Z <|im_start|>assistant
2025-02-14T22:56:16.236314004Z
2025-02-14T22:56:16.238224159Z Grammar:
2025-02-14T22:56:16.238263195Z Grammar lazy: false
2025-02-14T22:56:16.238270758Z Chat format: Content-only
2025-02-14T22:56:16.238274102Z srv add_waiting_: add task 51 to waiting list. current waiting = 0 (before add)
2025-02-14T22:56:16.238276653Z que post: new task, id = 51/1, front = 0
2025-02-14T22:56:16.238279030Z que start_loop: processing new tasks
2025-02-14T22:56:16.238281376Z que start_loop: processing task, id = 51
2025-02-14T22:56:16.238283763Z slot get_availabl: id 0 | task 6 | selected slot by lru, t_last = 238580703706
2025-02-14T22:56:16.238286253Z slot reset: id 0 | task 6 |
2025-02-14T22:56:16.238433085Z slot launch_slot_: id 0 | task 51 | launching slot : {"id":0,"id_task":51,"n_ctx":4096,"speculative":false,"is_processing":false,"non_causal":false,"params":{"n_predict":-1,"seed":4294967295,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":4096,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":-1,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"\u0001","grammar_trigger_tokens":[],"samplers":["penalties","dry","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":5,"speculative.p_min":0.8999999761581421,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhere is Vancouver?<|im_end|>\n<|im_start|>assistant\n","next_token":{"has_next_token":false,"has_new_line":false,"n_remain":-1,"n_decoded":44,"stopping_word":""}}
2025-02-14T22:56:16.238463211Z parse: error parsing grammar: expecting name at
2025-02-14T22:56:16.238466751Z
2025-02-14T22:56:16.238469210Z
2025-02-14T22:56:16.238471587Z slot launch_slot_: id 0 | task 51 | processing task
2025-02-14T22:56:16.238474262Z que start_loop: update slots
2025-02-14T22:56:16.238476628Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.238478943Z que post: new task, id = 52, front = 0
2025-02-14T22:56:16.238481320Z slot update_slots: id 0 | task 51 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 33
2025-02-14T22:56:16.238483707Z slot update_slots: id 0 | task 51 | prompt token 0: 151644 '<|im_start|>'
2025-02-14T22:56:16.238486145Z slot update_slots: id 0 | task 51 | prompt token 1: 8948 'system'
2025-02-14T22:56:16.238488471Z slot update_slots: id 0 | task 51 | prompt token 2: 198 '
2025-02-14T22:56:16.238490817Z '
2025-02-14T22:56:16.238493101Z slot update_slots: id 0 | task 51 | prompt token 3: 2610 'You'
2025-02-14T22:56:16.238496424Z slot update_slots: id 0 | task 51 | prompt token 4: 525 ' are'
2025-02-14T22:56:16.238499048Z slot update_slots: id 0 | task 51 | prompt token 5: 1207 ' Q'
2025-02-14T22:56:16.238501394Z slot update_slots: id 0 | task 51 | prompt token 6: 16948 'wen'
2025-02-14T22:56:16.238503956Z slot update_slots: id 0 | task 51 | prompt token 7: 11 ','
2025-02-14T22:56:16.238519502Z slot update_slots: id 0 | task 51 | prompt token 8: 3465 ' created'
2025-02-14T22:56:16.238526653Z slot update_slots: id 0 | task 51 | prompt token 9: 553 ' by'
2025-02-14T22:56:16.238529565Z slot update_slots: id 0 | task 51 | prompt token 10: 54364 ' Alibaba'
2025-02-14T22:56:16.238532024Z slot update_slots: id 0 | task 51 | prompt token 11: 14817 ' Cloud'
2025-02-14T22:56:16.238534534Z slot update_slots: id 0 | task 51 | prompt token 12: 13 '.'
2025-02-14T22:56:16.238536891Z slot update_slots: id 0 | task 51 | prompt token 13: 1446 ' You'
2025-02-14T22:56:16.238539216Z slot update_slots: id 0 | task 51 | prompt token 14: 525 ' are'
2025-02-14T22:56:16.238541551Z slot update_slots: id 0 | task 51 | prompt token 15: 264 ' a'
2025-02-14T22:56:16.238543897Z slot update_slots: id 0 | task 51 | need to evaluate at least 1 token to generate logits, n_past = 33, n_prompt_tokens = 33
2025-02-14T22:56:16.238546295Z slot update_slots: id 0 | task 51 | kv cache rm [32, end)
2025-02-14T22:56:16.238557376Z slot update_slots: id 0 | task 51 | prompt processing progress, n_past = 33, n_tokens = 1, progress = 0.030303
2025-02-14T22:56:16.238559856Z slot update_slots: id 0 | task 51 | prompt done, n_past = 33, n_tokens = 1
2025-02-14T22:56:16.238562387Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.252891531Z slot process_toke: id 0 | task 51 | n_decoded = 1, n_remaining = -1, next token: 53 'V'
2025-02-14T22:56:16.252931504Z srv update_slots: run slots completed
2025-02-14T22:56:16.252940383Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.252944540Z que start_loop: processing new tasks
2025-02-14T22:56:16.252947709Z que start_loop: processing task, id = 52
2025-02-14T22:56:16.252950785Z que start_loop: update slots
2025-02-14T22:56:16.252954047Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.252990346Z que post: new task, id = 53, front = 0
2025-02-14T22:56:16.252993978Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 34, n_cache_tokens = 34, truncated = 0
2025-02-14T22:56:16.252997230Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.264186466Z slot process_toke: id 0 | task 51 | n_decoded = 2, n_remaining = -1, next token: 20471 'ancouver'
2025-02-14T22:56:16.264237211Z srv update_slots: run slots completed
2025-02-14T22:56:16.264245411Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.264248735Z que start_loop: processing new tasks
2025-02-14T22:56:16.264251328Z que start_loop: processing task, id = 53
2025-02-14T22:56:16.264253838Z que start_loop: update slots
2025-02-14T22:56:16.264256225Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.264258643Z que post: new task, id = 54, front = 0
2025-02-14T22:56:16.264261061Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 35, n_cache_tokens = 35, truncated = 0
2025-02-14T22:56:16.264263633Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.275229823Z slot process_toke: id 0 | task 51 | n_decoded = 3, n_remaining = -1, next token: 374 ' is'
2025-02-14T22:56:16.275266184Z srv update_slots: run slots completed
2025-02-14T22:56:16.275273428Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.275276597Z que start_loop: processing new tasks
2025-02-14T22:56:16.275279190Z que start_loop: processing task, id = 54
2025-02-14T22:56:16.275281803Z que start_loop: update slots
2025-02-14T22:56:16.275284169Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.275286515Z que post: new task, id = 55, front = 0
2025-02-14T22:56:16.275289046Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 36, n_cache_tokens = 36, truncated = 0
2025-02-14T22:56:16.275301547Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.286878187Z slot process_toke: id 0 | task 51 | n_decoded = 4, n_remaining = -1, next token: 264 ' a'
2025-02-14T22:56:16.286928315Z srv update_slots: run slots completed
2025-02-14T22:56:16.286936319Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.286940260Z que start_loop: processing new tasks
2025-02-14T22:56:16.286943758Z que start_loop: processing task, id = 55
2025-02-14T22:56:16.286947627Z que start_loop: update slots
2025-02-14T22:56:16.286951012Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.286954243Z que post: new task, id = 56, front = 0
2025-02-14T22:56:16.286957648Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 37, n_cache_tokens = 37, truncated = 0
2025-02-14T22:56:16.286961034Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.297429910Z slot process_toke: id 0 | task 51 | n_decoded = 5, n_remaining = -1, next token: 3598 ' major'
2025-02-14T22:56:16.297465963Z srv update_slots: run slots completed
2025-02-14T22:56:16.297473278Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.297476581Z que start_loop: processing new tasks
2025-02-14T22:56:16.297479102Z que start_loop: processing task, id = 56
2025-02-14T22:56:16.297481530Z que start_loop: update slots
2025-02-14T22:56:16.297483948Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.297486345Z que post: new task, id = 57, front = 0
2025-02-14T22:56:16.297489226Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 38, n_cache_tokens = 38, truncated = 0
2025-02-14T22:56:16.297491798Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.308420791Z slot process_toke: id 0 | task 51 | n_decoded = 6, n_remaining = -1, next token: 3283 ' city'
2025-02-14T22:56:16.308463532Z srv update_slots: run slots completed
2025-02-14T22:56:16.308484573Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.308490540Z que start_loop: processing new tasks
2025-02-14T22:56:16.308493576Z que start_loop: processing task, id = 57
2025-02-14T22:56:16.308496127Z que start_loop: update slots
2025-02-14T22:56:16.308498545Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.308501220Z que post: new task, id = 58, front = 0
2025-02-14T22:56:16.308503679Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 39, n_cache_tokens = 39, truncated = 0
2025-02-14T22:56:16.308506365Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.319715997Z slot process_toke: id 0 | task 51 | n_decoded = 7, n_remaining = -1, next token: 7407 ' located'
2025-02-14T22:56:16.319800933Z srv update_slots: run slots completed
2025-02-14T22:56:16.319811047Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.319815039Z que start_loop: processing new tasks
2025-02-14T22:56:16.319818455Z que start_loop: processing task, id = 58
2025-02-14T22:56:16.319821490Z que start_loop: update slots
2025-02-14T22:56:16.319824536Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.319827571Z que post: new task, id = 59, front = 0
2025-02-14T22:56:16.319830637Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 40, n_cache_tokens = 40, truncated = 0
2025-02-14T22:56:16.319833837Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.331440241Z slot process_toke: id 0 | task 51 | n_decoded = 8, n_remaining = -1, next token: 389 ' on'
2025-02-14T22:56:16.331479792Z srv update_slots: run slots completed
2025-02-14T22:56:16.331487406Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.331490668Z que start_loop: processing new tasks
2025-02-14T22:56:16.331493281Z que start_loop: processing task, id = 59
2025-02-14T22:56:16.331495678Z que start_loop: update slots
2025-02-14T22:56:16.331498055Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.331500463Z que post: new task, id = 60, front = 0
2025-02-14T22:56:16.331502881Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 41, n_cache_tokens = 41, truncated = 0
2025-02-14T22:56:16.331505473Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.342951902Z slot process_toke: id 0 | task 51 | n_decoded = 9, n_remaining = -1, next token: 279 ' the'
2025-02-14T22:56:16.342994869Z srv update_slots: run slots completed
2025-02-14T22:56:16.343002256Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.343005487Z que start_loop: processing new tasks
2025-02-14T22:56:16.343008018Z que start_loop: processing task, id = 60
2025-02-14T22:56:16.343010436Z que start_loop: update slots
2025-02-14T22:56:16.343012977Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.343015375Z que post: new task, id = 61, front = 0
2025-02-14T22:56:16.343017741Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 42, n_cache_tokens = 42, truncated = 0
2025-02-14T22:56:16.343020241Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.354398013Z slot process_toke: id 0 | task 51 | n_decoded = 10, n_remaining = -1, next token: 9710 ' west'
2025-02-14T22:56:16.354446978Z srv update_slots: run slots completed
2025-02-14T22:56:16.354455178Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.354473163Z que start_loop: processing new tasks
2025-02-14T22:56:16.354475962Z que start_loop: processing task, id = 61
2025-02-14T22:56:16.354478349Z que start_loop: update slots
2025-02-14T22:56:16.354481075Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.354483534Z que post: new task, id = 62, front = 0
2025-02-14T22:56:16.354485993Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 43, n_cache_tokens = 43, truncated = 0
2025-02-14T22:56:16.354488514Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.365622027Z slot process_toke: id 0 | task 51 | n_decoded = 11, n_remaining = -1, next token: 13648 ' coast'
2025-02-14T22:56:16.365663780Z srv update_slots: run slots completed
2025-02-14T22:56:16.365671064Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.365674717Z que start_loop: processing new tasks
2025-02-14T22:56:16.365677207Z que start_loop: processing task, id = 62
2025-02-14T22:56:16.365679522Z que start_loop: update slots
2025-02-14T22:56:16.365681868Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.365684203Z que post: new task, id = 63, front = 0
2025-02-14T22:56:16.365686673Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 44, n_cache_tokens = 44, truncated = 0
2025-02-14T22:56:16.365689152Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.376510832Z slot process_toke: id 0 | task 51 | n_decoded = 12, n_remaining = -1, next token: 315 ' of'
2025-02-14T22:56:16.376565446Z srv update_slots: run slots completed
2025-02-14T22:56:16.376574696Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.376578873Z que start_loop: processing new tasks
2025-02-14T22:56:16.376582351Z que start_loop: processing task, id = 63
2025-02-14T22:56:16.376585695Z que start_loop: update slots
2025-02-14T22:56:16.376588658Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.376591642Z que post: new task, id = 64, front = 0
2025-02-14T22:56:16.376594677Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 45, n_cache_tokens = 45, truncated = 0
2025-02-14T22:56:16.376597826Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.387883679Z slot process_toke: id 0 | task 51 | n_decoded = 13, n_remaining = -1, next token: 6864 ' Canada'
2025-02-14T22:56:16.387925719Z srv update_slots: run slots completed
2025-02-14T22:56:16.387933292Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.387936461Z que start_loop: processing new tasks
2025-02-14T22:56:16.387938920Z que start_loop: processing task, id = 64
2025-02-14T22:56:16.387941235Z que start_loop: update slots
2025-02-14T22:56:16.387953911Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.387956350Z que post: new task, id = 65, front = 0
2025-02-14T22:56:16.387958706Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 46, n_cache_tokens = 46, truncated = 0
2025-02-14T22:56:16.387961083Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.398513621Z slot process_toke: id 0 | task 51 | n_decoded = 14, n_remaining = -1, next token: 13 '.'
2025-02-14T22:56:16.398559479Z srv update_slots: run slots completed
2025-02-14T22:56:16.398566774Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.398569912Z que start_loop: processing new tasks
2025-02-14T22:56:16.398572392Z que start_loop: processing task, id = 65
2025-02-14T22:56:16.398574892Z que start_loop: update slots
2025-02-14T22:56:16.398577228Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.398579841Z que post: new task, id = 66, front = 0
2025-02-14T22:56:16.398582280Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 47, n_cache_tokens = 47, truncated = 0
2025-02-14T22:56:16.398584934Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.409952873Z slot process_toke: id 0 | task 51 | n_decoded = 15, n_remaining = -1, next token: 1084 ' It'
2025-02-14T22:56:16.409996262Z srv update_slots: run slots completed
2025-02-14T22:56:16.410003670Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.410006757Z que start_loop: processing new tasks
2025-02-14T22:56:16.410009319Z que start_loop: processing task, id = 66
2025-02-14T22:56:16.410011603Z que start_loop: update slots
2025-02-14T22:56:16.410013918Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.410016233Z que post: new task, id = 67, front = 0
2025-02-14T22:56:16.410018558Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 48, n_cache_tokens = 48, truncated = 0
2025-02-14T22:56:16.410021141Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.421285060Z slot process_toke: id 0 | task 51 | n_decoded = 16, n_remaining = -1, next token: 374 ' is'
2025-02-14T22:56:16.421335620Z srv update_slots: run slots completed
2025-02-14T22:56:16.421344489Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.421348615Z que start_loop: processing new tasks
2025-02-14T22:56:16.421351969Z que start_loop: processing task, id = 67
2025-02-14T22:56:16.421355036Z que start_loop: update slots
2025-02-14T22:56:16.421358122Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.421361085Z que post: new task, id = 68, front = 0
2025-02-14T22:56:16.421375747Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 49, n_cache_tokens = 49, truncated = 0
2025-02-14T22:56:16.421379348Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.432129889Z slot process_toke: id 0 | task 51 | n_decoded = 17, n_remaining = -1, next token: 279 ' the'
2025-02-14T22:56:16.432172876Z srv update_slots: run slots completed
2025-02-14T22:56:16.432183042Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.432186468Z que start_loop: processing new tasks
2025-02-14T22:56:16.432189133Z que start_loop: processing task, id = 68
2025-02-14T22:56:16.432191643Z que start_loop: update slots
2025-02-14T22:56:16.432194339Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.432196860Z que post: new task, id = 69, front = 0
2025-02-14T22:56:16.432199401Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 50, n_cache_tokens = 50, truncated = 0
2025-02-14T22:56:16.432202035Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.443518146Z slot process_toke: id 0 | task 51 | n_decoded = 18, n_remaining = -1, next token: 7772 ' largest'
2025-02-14T22:56:16.443570692Z srv update_slots: run slots completed
2025-02-14T22:56:16.443578697Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.443582463Z que start_loop: processing new tasks
2025-02-14T22:56:16.443585241Z que start_loop: processing task, id = 69
2025-02-14T22:56:16.443587905Z que start_loop: update slots
2025-02-14T22:56:16.443590323Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.443592659Z que post: new task, id = 70, front = 0
2025-02-14T22:56:16.443595128Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 51, n_cache_tokens = 51, truncated = 0
2025-02-14T22:56:16.443597618Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.454617224Z slot process_toke: id 0 | task 51 | n_decoded = 19, n_remaining = -1, next token: 3283 ' city'
2025-02-14T22:56:16.454640477Z srv update_slots: run slots completed
2025-02-14T22:56:16.454643759Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.454646280Z que start_loop: processing new tasks
2025-02-14T22:56:16.454648800Z que start_loop: processing task, id = 70
2025-02-14T22:56:16.454651301Z que start_loop: update slots
2025-02-14T22:56:16.454653688Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.454656106Z que post: new task, id = 71, front = 0
2025-02-14T22:56:16.454658616Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 52, n_cache_tokens = 52, truncated = 0
2025-02-14T22:56:16.454661055Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.466666853Z slot process_toke: id 0 | task 51 | n_decoded = 20, n_remaining = -1, next token: 304 ' in'
2025-02-14T22:56:16.466707340Z srv update_slots: run slots completed
2025-02-14T22:56:16.466714882Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.466718061Z que start_loop: processing new tasks
2025-02-14T22:56:16.466720489Z que start_loop: processing task, id = 71
2025-02-14T22:56:16.466723041Z que start_loop: update slots
2025-02-14T22:56:16.466725325Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.466727578Z que post: new task, id = 72, front = 0
2025-02-14T22:56:16.466729883Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 53, n_cache_tokens = 53, truncated = 0
2025-02-14T22:56:16.466732250Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.477458318Z slot process_toke: id 0 | task 51 | n_decoded = 21, n_remaining = -1, next token: 279 ' the'
2025-02-14T22:56:16.477500821Z srv update_slots: run slots completed
2025-02-14T22:56:16.477509742Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.477513785Z que start_loop: processing new tasks
2025-02-14T22:56:16.477516563Z que start_loop: processing task, id = 72
2025-02-14T22:56:16.477519084Z que start_loop: update slots
2025-02-14T22:56:16.477521667Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.477524095Z que post: new task, id = 73, front = 0
2025-02-14T22:56:16.477526472Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 54, n_cache_tokens = 54, truncated = 0
2025-02-14T22:56:16.477528972Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.488350701Z slot process_toke: id 0 | task 51 | n_decoded = 22, n_remaining = -1, next token: 16847 ' province'
2025-02-14T22:56:16.488391126Z srv update_slots: run slots completed
2025-02-14T22:56:16.488398473Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.488401395Z que start_loop: processing new tasks
2025-02-14T22:56:16.488403813Z que start_loop: processing task, id = 73
2025-02-14T22:56:16.488406189Z que start_loop: update slots
2025-02-14T22:56:16.488408597Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.488411457Z que post: new task, id = 74, front = 0
2025-02-14T22:56:16.488414060Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 55, n_cache_tokens = 55, truncated = 0
2025-02-14T22:56:16.488416591Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.500276104Z slot process_toke: id 0 | task 51 | n_decoded = 23, n_remaining = -1, next token: 315 ' of'
2025-02-14T22:56:16.500311343Z srv update_slots: run slots completed
2025-02-14T22:56:16.500338794Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.500342395Z que start_loop: processing new tasks
2025-02-14T22:56:16.500345019Z que start_loop: processing task, id = 74
2025-02-14T22:56:16.500347313Z que start_loop: update slots
2025-02-14T22:56:16.500349649Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.500351882Z que post: new task, id = 75, front = 0
2025-02-14T22:56:16.500354228Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 56, n_cache_tokens = 56, truncated = 0
2025-02-14T22:56:16.500357757Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.511841560Z slot process_toke: id 0 | task 51 | n_decoded = 24, n_remaining = -1, next token: 7855 ' British'
2025-02-14T22:56:16.511885731Z srv update_slots: run slots completed
2025-02-14T22:56:16.511894250Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.511898047Z que start_loop: processing new tasks
2025-02-14T22:56:16.511901030Z que start_loop: processing task, id = 75
2025-02-14T22:56:16.511903870Z que start_loop: update slots
2025-02-14T22:56:16.511906658Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.511909416Z que post: new task, id = 76, front = 0
2025-02-14T22:56:16.511912266Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 57, n_cache_tokens = 57, truncated = 0
2025-02-14T22:56:16.511915281Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.522323024Z slot process_toke: id 0 | task 51 | n_decoded = 25, n_remaining = -1, next token: 18796 ' Columbia'
2025-02-14T22:56:16.522365065Z srv update_slots: run slots completed
2025-02-14T22:56:16.522373162Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.522376208Z que start_loop: processing new tasks
2025-02-14T22:56:16.522378677Z que start_loop: processing task, id = 76
2025-02-14T22:56:16.522380972Z que start_loop: update slots
2025-02-14T22:56:16.522383420Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.522385725Z que post: new task, id = 77, front = 0
2025-02-14T22:56:16.522388308Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 58, n_cache_tokens = 58, truncated = 0
2025-02-14T22:56:16.522390746Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.534017895Z slot process_toke: id 0 | task 51 | n_decoded = 26, n_remaining = -1, next token: 323 ' and'
2025-02-14T22:56:16.534042023Z srv update_slots: run slots completed
2025-02-14T22:56:16.534045336Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.534047960Z que start_loop: processing new tasks
2025-02-14T22:56:16.534050419Z que start_loop: processing task, id = 77
2025-02-14T22:56:16.534064720Z que start_loop: update slots
2025-02-14T22:56:16.534067190Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.534069618Z que post: new task, id = 78, front = 0
2025-02-14T22:56:16.534072025Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 59, n_cache_tokens = 59, truncated = 0
2025-02-14T22:56:16.534074567Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.545669409Z slot process_toke: id 0 | task 51 | n_decoded = 27, n_remaining = -1, next token: 374 ' is'
2025-02-14T22:56:16.545714536Z srv update_slots: run slots completed
2025-02-14T22:56:16.545721955Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.545724969Z que start_loop: processing new tasks
2025-02-14T22:56:16.545727439Z que start_loop: processing task, id = 78
2025-02-14T22:56:16.545729846Z que start_loop: update slots
2025-02-14T22:56:16.545732254Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.545734713Z que post: new task, id = 79, front = 0
2025-02-14T22:56:16.545737172Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 60, n_cache_tokens = 60, truncated = 0
2025-02-14T22:56:16.545739590Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.557683053Z slot process_toke: id 0 | task 51 | n_decoded = 28, n_remaining = -1, next token: 3881 ' known'
2025-02-14T22:56:16.557726884Z srv update_slots: run slots completed
2025-02-14T22:56:16.557734498Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.557737862Z que start_loop: processing new tasks
2025-02-14T22:56:16.557740486Z que start_loop: processing task, id = 79
2025-02-14T22:56:16.557743120Z que start_loop: update slots
2025-02-14T22:56:16.557745610Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.557748069Z que post: new task, id = 80, front = 0
2025-02-14T22:56:16.557750754Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 61, n_cache_tokens = 61, truncated = 0
2025-02-14T22:56:16.557753347Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.568554322Z slot process_toke: id 0 | task 51 | n_decoded = 29, n_remaining = -1, next token: 369 ' for'
2025-02-14T22:56:16.568593132Z srv update_slots: run slots completed
2025-02-14T22:56:16.568600602Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.568603822Z que start_loop: processing new tasks
2025-02-14T22:56:16.568606322Z que start_loop: processing task, id = 80
2025-02-14T22:56:16.568608678Z que start_loop: update slots
2025-02-14T22:56:16.568611065Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.568613380Z que post: new task, id = 81, front = 0
2025-02-14T22:56:16.568625048Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 62, n_cache_tokens = 62, truncated = 0
2025-02-14T22:56:16.568627785Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.581090778Z slot process_toke: id 0 | task 51 | n_decoded = 30, n_remaining = -1, next token: 1181 ' its'
2025-02-14T22:56:16.581140535Z srv update_slots: run slots completed
2025-02-14T22:56:16.581149847Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.581153170Z que start_loop: processing new tasks
2025-02-14T22:56:16.581156133Z que start_loop: processing task, id = 81
2025-02-14T22:56:16.581158788Z que start_loop: update slots
2025-02-14T22:56:16.581161319Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.581164118Z que post: new task, id = 82, front = 0
2025-02-14T22:56:16.581166597Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 63, n_cache_tokens = 63, truncated = 0
2025-02-14T22:56:16.581169180Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.592508134Z slot process_toke: id 0 | task 51 | n_decoded = 31, n_remaining = -1, next token: 5810 ' natural'
2025-02-14T22:56:16.592554599Z srv update_slots: run slots completed
2025-02-14T22:56:16.592562089Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.592565783Z que start_loop: processing new tasks
2025-02-14T22:56:16.592568417Z que start_loop: processing task, id = 82
2025-02-14T22:56:16.592570948Z que start_loop: update slots
2025-02-14T22:56:16.592573325Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.592575660Z que post: new task, id = 83, front = 0
2025-02-14T22:56:16.592578027Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 64, n_cache_tokens = 64, truncated = 0
2025-02-14T22:56:16.592580537Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.603589752Z slot process_toke: id 0 | task 51 | n_decoded = 32, n_remaining = -1, next token: 13143 ' beauty'
2025-02-14T22:56:16.603621412Z srv update_slots: run slots completed
2025-02-14T22:56:16.603624900Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.603627503Z que start_loop: processing new tasks
2025-02-14T22:56:16.603629951Z que start_loop: processing task, id = 83
2025-02-14T22:56:16.603632482Z que start_loop: update slots
2025-02-14T22:56:16.603634931Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.603637370Z que post: new task, id = 84, front = 0
2025-02-14T22:56:16.603640065Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 65, n_cache_tokens = 65, truncated = 0
2025-02-14T22:56:16.603652217Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.616362996Z slot process_toke: id 0 | task 51 | n_decoded = 33, n_remaining = -1, next token: 11 ','
2025-02-14T22:56:16.616416334Z srv update_slots: run slots completed
2025-02-14T22:56:16.616424205Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.616427487Z que start_loop: processing new tasks
2025-02-14T22:56:16.616430234Z que start_loop: processing task, id = 84
2025-02-14T22:56:16.616432806Z que start_loop: update slots
2025-02-14T22:56:16.616435399Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.616437766Z que post: new task, id = 85, front = 0
2025-02-14T22:56:16.616440173Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 66, n_cache_tokens = 66, truncated = 0
2025-02-14T22:56:16.616442704Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.628510006Z slot process_toke: id 0 | task 51 | n_decoded = 34, n_remaining = -1, next token: 23034 ' mild'
2025-02-14T22:56:16.628539422Z srv update_slots: run slots completed
2025-02-14T22:56:16.628542714Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.628545492Z que start_loop: processing new tasks
2025-02-14T22:56:16.628548198Z que start_loop: processing task, id = 85
2025-02-14T22:56:16.628550863Z que start_loop: update slots
2025-02-14T22:56:16.628553291Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.628555658Z que post: new task, id = 86, front = 0
2025-02-14T22:56:16.628558065Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 67, n_cache_tokens = 67, truncated = 0
2025-02-14T22:56:16.628560545Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.640030686Z slot process_toke: id 0 | task 51 | n_decoded = 35, n_remaining = -1, next token: 9977 ' climate'
2025-02-14T22:56:16.640081667Z srv update_slots: run slots completed
2025-02-14T22:56:16.640089333Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.640092646Z que start_loop: processing new tasks
2025-02-14T22:56:16.640095475Z que start_loop: processing task, id = 86
2025-02-14T22:56:16.640097924Z que start_loop: update slots
2025-02-14T22:56:16.640100537Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.640103099Z que post: new task, id = 87, front = 0
2025-02-14T22:56:16.640105774Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 68, n_cache_tokens = 68, truncated = 0
2025-02-14T22:56:16.640108306Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.651740288Z slot process_toke: id 0 | task 51 | n_decoded = 36, n_remaining = -1, next token: 11 ','
2025-02-14T22:56:16.651783142Z srv update_slots: run slots completed
2025-02-14T22:56:16.651802073Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.651805695Z que start_loop: processing new tasks
2025-02-14T22:56:16.651808555Z que start_loop: processing task, id = 87
2025-02-14T22:56:16.651811550Z que start_loop: update slots
2025-02-14T22:56:16.651814132Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.651817301Z que post: new task, id = 88, front = 0
2025-02-14T22:56:16.651819945Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 69, n_cache_tokens = 69, truncated = 0
2025-02-14T22:56:16.651822692Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.662228893Z slot process_toke: id 0 | task 51 | n_decoded = 37, n_remaining = -1, next token: 323 ' and'
2025-02-14T22:56:16.662269884Z srv update_slots: run slots completed
2025-02-14T22:56:16.662277621Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.662280934Z que start_loop: processing new tasks
2025-02-14T22:56:16.662283507Z que start_loop: processing task, id = 88
2025-02-14T22:56:16.662286285Z que start_loop: update slots
2025-02-14T22:56:16.662288682Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.662291059Z que post: new task, id = 89, front = 0
2025-02-14T22:56:16.662293487Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 70, n_cache_tokens = 70, truncated = 0
2025-02-14T22:56:16.662296028Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.673867911Z slot process_toke: id 0 | task 51 | n_decoded = 38, n_remaining = -1, next token: 16807 ' diverse'
2025-02-14T22:56:16.673915538Z srv update_slots: run slots completed
2025-02-14T22:56:16.673924531Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.673928626Z que start_loop: processing new tasks
2025-02-14T22:56:16.673931826Z que start_loop: processing task, id = 89
2025-02-14T22:56:16.673935108Z que start_loop: update slots
2025-02-14T22:56:16.673938400Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.673941899Z que post: new task, id = 90, front = 0
2025-02-14T22:56:16.673947959Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 71, n_cache_tokens = 71, truncated = 0
2025-02-14T22:56:16.673951282Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.684979679Z slot process_toke: id 0 | task 51 | n_decoded = 39, n_remaining = -1, next token: 7042 ' population'
2025-02-14T22:56:16.685023181Z srv update_slots: run slots completed
2025-02-14T22:56:16.685031176Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.685034663Z que start_loop: processing new tasks
2025-02-14T22:56:16.685046578Z que start_loop: processing task, id = 90
2025-02-14T22:56:16.685064172Z que start_loop: update slots
2025-02-14T22:56:16.685069533Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.685072331Z que post: new task, id = 91, front = 0
2025-02-14T22:56:16.685074996Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 72, n_cache_tokens = 72, truncated = 0
2025-02-14T22:56:16.685077764Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.698334257Z slot process_toke: id 0 | task 51 | n_decoded = 40, n_remaining = -1, next token: 13 '.'
2025-02-14T22:56:16.698371832Z srv update_slots: run slots completed
2025-02-14T22:56:16.698376308Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.698379497Z que start_loop: processing new tasks
2025-02-14T22:56:16.698382450Z que start_loop: processing task, id = 91
2025-02-14T22:56:16.698385444Z que start_loop: update slots
2025-02-14T22:56:16.698388325Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.698391247Z que post: new task, id = 92, front = 0
2025-02-14T22:56:16.698394221Z slot update_slots: id 0 | task 51 | slot decode token, n_ctx = 4096, n_past = 73, n_cache_tokens = 73, truncated = 0
2025-02-14T22:56:16.698397359Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.709826386Z slot process_toke: id 0 | task 51 | stopped by EOS
2025-02-14T22:56:16.709870372Z slot process_toke: id 0 | task 51 | n_decoded = 41, n_remaining = -1, next token: 151645 ''
2025-02-14T22:56:16.709878829Z slot release: id 0 | task 51 | stop processing: n_past = 73, truncated = 0
2025-02-14T22:56:16.709882369Z slot print_timing: id 0 | task 51 |
2025-02-14T22:56:16.709884982Z prompt eval time = 14.21 ms / 1 tokens ( 14.21 ms per token, 70.40 tokens per second)
2025-02-14T22:56:16.709887688Z eval time = 456.99 ms / 41 tokens ( 11.15 ms per token, 89.72 tokens per second)
2025-02-14T22:56:16.709891022Z total time = 471.19 ms / 42 tokens
2025-02-14T22:56:16.709893481Z srv send: sending result for task id = 51
2025-02-14T22:56:16.709896135Z srv send: task id = 51 pushed to result queue
2025-02-14T22:56:16.709898708Z srv update_slots: run slots completed
2025-02-14T22:56:16.709901115Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.709903502Z que start_loop: processing new tasks
2025-02-14T22:56:16.709905858Z que start_loop: processing task, id = 92
2025-02-14T22:56:16.709908215Z que start_loop: update slots
2025-02-14T22:56:16.709910715Z srv update_slots: all slots are idle
2025-02-14T22:56:16.709913092Z que start_loop: waiting for new tasks
2025-02-14T22:56:16.709927249Z srv remove_waiti: remove task 51 from waiting list. current waiting = 1 (before remove)
2025-02-14T22:56:16.710111310Z request: POST /v1/chat/completions 172.17.0.1 200
2025-02-14T22:56:16.710144502Z }
2025-02-14T22:56:16.710150789Z response: {"choices":[{"finish_reason":"stop","index":0,"message":{"content":"Vancouver is a major city located on the west coast of Canada. It is the largest city in the province of British Columbia and is known for its natural beauty, mild climate, and diverse population.","tool_calls":null,"role":"assistant"}}],"created":1739573776,"model":"gpt-3.5-turbo","system_fingerprint":"b4603-4a2b196d","object":"chat.completion","usage":{"completion_tokens":41,"prompt_tokens":33,"total_tokens":74},"id":"chatcmpl-sr0c0yuDP6Ha8Q91sVp8YgLdu0iKHZjq","__verbose":{"index":0,"content":"Vancouver is a major city located on the west coast of Canada. It is the largest city in the province of British Columbia and is known for its natural beauty, mild climate, and diverse population.","tokens":[],"id_slot":0,"stop":true,"model":"gpt-3.5-turbo","tokens_predicted":41,"tokens_evaluated":33,"generation_settings":{"n_predict":-1,"seed":4294967295,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":4096,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":-1,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"\u0001","grammar_trigger_tokens":[],"samplers":["penalties","dry","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":5,"speculative.p_min":0.8999999761581421,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhere is Vancouver?<|im_end|>\n<|im_start|>assistant\n","has_new_line":false,"truncated":false,"stop_type":"eos","stopping_word":"","tokens_cached":73,"timings":{"prompt_n":1,"prompt_ms":14.205,"prompt_per_token_ms":14.205,"prompt_per_second":70.39774727208729,"predicted_n":41,"predicted_ms":456.989,"predicted_per_token_ms":11.146073170731707,"predicted_per_second":89.7176956119294}},"timings":{"prompt_n":1,"prompt_ms":14.205,"prompt_per_token_ms":14.205,"prompt_per_second":70.39774727208729,"predicted_n":41,"predicted_ms":456.989,"predicted_per_token_ms":11.146073170731707,"predicted_per_second":89.7176956119294}}
With tools:
curl http://localhost:8080/v1/chat/completions -d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "Where is Vancouver?"
}
],
"tools": []
}'
Logs for API call with tools
2025-02-14T23:00:02.681558344Z }
2025-02-14T23:00:02.681560773Z [common_chat_params_init] has_tools=true
2025-02-14T23:00:02.681563551Z Prompt: <|im_start|>system
2025-02-14T23:00:02.681566330Z You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
2025-02-14T23:00:02.681569047Z <|im_start|>user
2025-02-14T23:00:02.681571579Z Where is Vancouver?<|im_end|>
2025-02-14T23:00:02.681574388Z <|im_start|>assistant
2025-02-14T23:00:02.681576940Z
2025-02-14T23:00:02.683414441Z Grammar: root ::= "<tool_call>" space tool-call "</tool_call>" space
2025-02-14T23:00:02.683444646Z space ::= | " " | "\n" [ \t]{0,20}
2025-02-14T23:00:02.683452066Z tool-call ::=
2025-02-14T23:00:02.683455411Z
2025-02-14T23:00:02.683457912Z Grammar lazy: true
2025-02-14T23:00:02.683460412Z Chat format: Hermes 2 Pro
2025-02-14T23:00:02.683462841Z Grammar trigger token: 151657 (`<tool_call>`)
2025-02-14T23:00:02.683465620Z Grammar trigger token: 151658 (`</tool_call>`)
2025-02-14T23:00:02.683468223Z srv add_waiting_: add task 136 to waiting list. current waiting = 0 (before add)
2025-02-14T23:00:02.683470724Z que post: new task, id = 136/1, front = 0
2025-02-14T23:00:02.683473091Z que start_loop: processing new tasks
2025-02-14T23:00:02.683475520Z que start_loop: processing task, id = 136
2025-02-14T23:00:02.683477897Z slot get_availabl: id 0 | task 93 | selected slot by lru, t_last = 238812907679
2025-02-14T23:00:02.683480326Z slot reset: id 0 | task 93 |
2025-02-14T23:00:02.683579400Z slot launch_slot_: id 0 | task 136 | launching slot : {"id":0,"id_task":136,"n_ctx":4096,"speculative":false,"is_processing":false,"non_causal":false,"params":{"n_predict":-1,"seed":4294967295,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":4096,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":-1,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"root ::= \"<tool_call>\" space tool-call \"</tool_call>\" space\nspace ::= | \" \" | \"\\n\" [ \\t]{0,20}\ntool-call ::= \n","grammar_trigger_tokens":[151657,151658],"samplers":["penalties","dry","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":5,"speculative.p_min":0.8999999761581421,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhere is Vancouver?<|im_end|>\n<|im_start|>assistant\n","next_token":{"has_next_token":false,"has_new_line":false,"n_remain":-1,"n_decoded":42,"stopping_word":""}}
2025-02-14T23:00:02.683619896Z slot launch_slot_: id 0 | task 136 | processing task
2025-02-14T23:00:02.683656677Z que start_loop: update slots
2025-02-14T23:00:02.683660269Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.683662986Z que post: new task, id = 137, front = 0
2025-02-14T23:00:02.683665590Z slot update_slots: id 0 | task 136 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 33
2025-02-14T23:00:02.683668440Z slot update_slots: id 0 | task 136 | prompt token 0: 151644 '<|im_start|>'
2025-02-14T23:00:02.683671188Z slot update_slots: id 0 | task 136 | prompt token 1: 8948 'system'
2025-02-14T23:00:02.683674111Z slot update_slots: id 0 | task 136 | prompt token 2: 198 '
2025-02-14T23:00:02.683677003Z '
2025-02-14T23:00:02.683680121Z slot update_slots: id 0 | task 136 | prompt token 3: 2610 'You'
2025-02-14T23:00:02.683682714Z slot update_slots: id 0 | task 136 | prompt token 4: 525 ' are'
2025-02-14T23:00:02.683685297Z slot update_slots: id 0 | task 136 | prompt token 5: 1207 ' Q'
2025-02-14T23:00:02.683687911Z slot update_slots: id 0 | task 136 | prompt token 6: 16948 'wen'
2025-02-14T23:00:02.683690433Z slot update_slots: id 0 | task 136 | prompt token 7: 11 ','
2025-02-14T23:00:02.683692923Z slot update_slots: id 0 | task 136 | prompt token 8: 3465 ' created'
2025-02-14T23:00:02.683695455Z slot update_slots: id 0 | task 136 | prompt token 9: 553 ' by'
2025-02-14T23:00:02.683697945Z slot update_slots: id 0 | task 136 | prompt token 10: 54364 ' Alibaba'
2025-02-14T23:00:02.683700457Z slot update_slots: id 0 | task 136 | prompt token 11: 14817 ' Cloud'
2025-02-14T23:00:02.683706693Z slot update_slots: id 0 | task 136 | prompt token 12: 13 '.'
2025-02-14T23:00:02.683709348Z slot update_slots: id 0 | task 136 | prompt token 13: 1446 ' You'
2025-02-14T23:00:02.683712003Z slot update_slots: id 0 | task 136 | prompt token 14: 525 ' are'
2025-02-14T23:00:02.683714494Z slot update_slots: id 0 | task 136 | prompt token 15: 264 ' a'
2025-02-14T23:00:02.683717190Z slot update_slots: id 0 | task 136 | need to evaluate at least 1 token to generate logits, n_past = 33, n_prompt_tokens = 33
2025-02-14T23:00:02.683719804Z slot update_slots: id 0 | task 136 | kv cache rm [32, end)
2025-02-14T23:00:02.683730857Z slot update_slots: id 0 | task 136 | prompt processing progress, n_past = 33, n_tokens = 1, progress = 0.030303
2025-02-14T23:00:02.683735591Z slot update_slots: id 0 | task 136 | prompt done, n_past = 33, n_tokens = 1
2025-02-14T23:00:02.683740047Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.696865908Z Grammar still awaiting trigger after token 53 (`V`) (buffer: `V`)
2025-02-14T23:00:02.696907650Z slot process_toke: id 0 | task 136 | n_decoded = 1, n_remaining = -1, next token: 53 'V'
2025-02-14T23:00:02.696915265Z srv update_slots: run slots completed
2025-02-14T23:00:02.696918497Z que start_loop: waiting for new tasks
2025-02-14T23:00:02.696921069Z que start_loop: processing new tasks
2025-02-14T23:00:02.696923745Z que start_loop: processing task, id = 137
2025-02-14T23:00:02.696926267Z que start_loop: update slots
2025-02-14T23:00:02.696928736Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.696931628Z que post: new task, id = 138, front = 0
2025-02-14T23:00:02.696934170Z slot update_slots: id 0 | task 136 | slot decode token, n_ctx = 4096, n_past = 34, n_cache_tokens = 34, truncated = 0
2025-02-14T23:00:02.696936702Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.707383282Z Grammar still awaiting trigger after token 20471 (`ancouver`) (buffer: `Vancouver`)
2025-02-14T23:00:02.707428553Z slot process_toke: id 0 | task 136 | n_decoded = 2, n_remaining = -1, next token: 20471 'ancouver'
2025-02-14T23:00:02.707436488Z srv update_slots: run slots completed
2025-02-14T23:00:02.707439894Z que start_loop: waiting for new tasks
2025-02-14T23:00:02.707442807Z que start_loop: processing new tasks
2025-02-14T23:00:02.707445215Z que start_loop: processing task, id = 138
2025-02-14T23:00:02.707447726Z que start_loop: update slots
2025-02-14T23:00:02.707450124Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.707452532Z que post: new task, id = 139, front = 0
2025-02-14T23:00:02.707454971Z slot update_slots: id 0 | task 136 | slot decode token, n_ctx = 4096, n_past = 35, n_cache_tokens = 35, truncated = 0
2025-02-14T23:00:02.707457441Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.718343464Z Grammar still awaiting trigger after token 374 (` is`) (buffer: `Vancouver is`)
2025-02-14T23:00:02.718391792Z slot process_toke: id 0 | task 136 | n_decoded = 3, n_remaining = -1, next token: 374 ' is'
2025-02-14T23:00:02.718399500Z srv update_slots: run slots completed
2025-02-14T23:00:02.718402762Z que start_loop: waiting for new tasks
2025-02-14T23:00:02.718405284Z que start_loop: processing new tasks
2025-02-14T23:00:02.718407795Z que start_loop: processing task, id = 139
2025-02-14T23:00:02.718410265Z que start_loop: update slots
2025-02-14T23:00:02.718426895Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.718429509Z que post: new task, id = 140, front = 0
2025-02-14T23:00:02.718432185Z slot update_slots: id 0 | task 136 | slot decode token, n_ctx = 4096, n_past = 36, n_cache_tokens = 36, truncated = 0
2025-02-14T23:00:02.718434727Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.729332159Z Grammar still awaiting trigger after token 264 (` a`) (buffer: `Vancouver is a`)
2025-02-14T23:00:02.729363352Z slot process_toke: id 0 | task 136 | n_decoded = 4, n_remaining = -1, next token: 264 ' a'
2025-02-14T23:00:02.729366748Z srv update_slots: run slots completed
2025-02-14T23:00:02.729369660Z que start_loop: waiting for new tasks
2025-02-14T23:00:02.729372161Z que start_loop: processing new tasks
2025-02-14T23:00:02.729374611Z que start_loop: processing task, id = 140
2025-02-14T23:00:02.729377080Z que start_loop: update slots
2025-02-14T23:00:02.729379499Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.729381897Z que post: new task, id = 141, front = 0
2025-02-14T23:00:02.729384315Z slot update_slots: id 0 | task 136 | slot decode token, n_ctx = 4096, n_past = 37, n_cache_tokens = 37, truncated = 0
2025-02-14T23:00:02.729386826Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.740639890Z Grammar still awaiting trigger after token 3598 (` major`) (buffer: `Vancouver is a major`)
2025-02-14T23:00:02.740691192Z slot process_toke: id 0 | task 136 | n_decoded = 5, n_remaining = -1, next token: 3598 ' major'
2025-02-14T23:00:02.740700588Z srv update_slots: run slots completed
2025-02-14T23:00:02.740704828Z que start_loop: waiting for new tasks
2025-02-14T23:00:02.740708060Z que start_loop: processing new tasks
2025-02-14T23:00:02.740711198Z que start_loop: processing task, id = 141
2025-02-14T23:00:02.740714173Z que start_loop: update slots
2025-02-14T23:00:02.740717384Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.740720347Z que post: new task, id = 142, front = 0
2025-02-14T23:00:02.740723404Z slot update_slots: id 0 | task 136 | slot decode token, n_ctx = 4096, n_past = 38, n_cache_tokens = 38, truncated = 0
2025-02-14T23:00:02.740729424Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.753889945Z Grammar still awaiting trigger after token 3283 (` city`) (buffer: `Vancouver is a major city`)
2025-02-14T23:00:02.753926345Z slot process_toke: id 0 | task 136 | n_decoded = 6, n_remaining = -1, next token: 3283 ' city'
2025-02-14T23:00:02.753932674Z srv update_slots: run slots completed
2025-02-14T23:00:02.753936564Z que start_loop: waiting for new tasks
2025-02-14T23:00:02.753939930Z que start_loop: processing new tasks
2025-02-14T23:00:02.753964804Z que start_loop: processing task, id = 142
2025-02-14T23:00:02.753970608Z que start_loop: update slots
2025-02-14T23:00:02.753973881Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.753977143Z que post: new task, id = 143, front = 0
2025-02-14T23:00:02.753980426Z slot update_slots: id 0 | task 136 | slot decode token, n_ctx = 4096, n_past = 39, n_cache_tokens = 39, truncated = 0
2025-02-14T23:00:02.753984265Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.765268589Z Grammar still awaiting trigger after token 7407 (` located`) (buffer: `Vancouver is a major city located`)
2025-02-14T23:00:02.765293998Z slot process_toke: id 0 | task 136 | n_decoded = 7, n_remaining = -1, next token: 7407 ' located'
2025-02-14T23:00:02.765297343Z srv update_slots: run slots completed
2025-02-14T23:00:02.765299915Z que start_loop: waiting for new tasks
2025-02-14T23:00:02.765302375Z que start_loop: processing new tasks
2025-02-14T23:00:02.765304896Z que start_loop: processing task, id = 143
2025-02-14T23:00:02.765307366Z que start_loop: update slots
2025-02-14T23:00:02.765311956Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.765314570Z que post: new task, id = 144, front = 0
2025-02-14T23:00:02.765317050Z slot update_slots: id 0 | task 136 | slot decode token, n_ctx = 4096, n_past = 40, n_cache_tokens = 40, truncated = 0
2025-02-14T23:00:02.765319788Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.777740005Z Grammar still awaiting trigger after token 304 (` in`) (buffer: `Vancouver is a major city located in`)
2025-02-14T23:00:02.777783352Z slot process_toke: id 0 | task 136 | n_decoded = 8, n_remaining = -1, next token: 304 ' in'
2025-02-14T23:00:02.777791965Z srv update_slots: run slots completed
2025-02-14T23:00:02.777795259Z que start_loop: waiting for new tasks
2025-02-14T23:00:02.777797832Z que start_loop: processing new tasks
2025-02-14T23:00:02.777800219Z que start_loop: processing task, id = 144
2025-02-14T23:00:02.777802751Z que start_loop: update slots
2025-02-14T23:00:02.777805138Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.777821327Z que post: new task, id = 145, front = 0
2025-02-14T23:00:02.777826966Z slot update_slots: id 0 | task 136 | slot decode token, n_ctx = 4096, n_past = 41, n_cache_tokens = 41, truncated = 0
2025-02-14T23:00:02.777829570Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.790010862Z Grammar still awaiting trigger after token 279 (` the`) (buffer: `Vancouver is a major city located in the`)
2025-02-14T23:00:02.790050452Z slot process_toke: id 0 | task 136 | n_decoded = 9, n_remaining = -1, next token: 279 ' the'
2025-02-14T23:00:02.790058860Z srv update_slots: run slots completed
2025-02-14T23:00:02.790072033Z que start_loop: waiting for new tasks
2025-02-14T23:00:02.790074740Z que start_loop: processing new tasks
2025-02-14T23:00:02.790077138Z que start_loop: processing task, id = 145
2025-02-14T23:00:02.790079587Z que start_loop: update slots
2025-02-14T23:00:02.790082057Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.790084475Z que post: new task, id = 146, front = 0
2025-02-14T23:00:02.790086904Z slot update_slots: id 0 | task 136 | slot decode token, n_ctx = 4096, n_past = 42, n_cache_tokens = 42, truncated = 0
2025-02-14T23:00:02.790089384Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.800634097Z Grammar still awaiting trigger after token 16847 (` province`) (buffer: `Vancouver is a major city located in the province`)
2025-02-14T23:00:02.800673430Z slot process_toke: id 0 | task 136 | n_decoded = 10, n_remaining = -1, next token: 16847 ' province'
2025-02-14T23:00:02.800684051Z srv update_slots: run slots completed
2025-02-14T23:00:02.800687622Z que start_loop: waiting for new tasks
2025-02-14T23:00:02.800690287Z que start_loop: processing new tasks
2025-02-14T23:00:02.800692788Z que start_loop: processing task, id = 146
2025-02-14T23:00:02.800695402Z que start_loop: update slots
2025-02-14T23:00:02.800697954Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.800700414Z que post: new task, id = 147, front = 0
2025-02-14T23:00:02.800702915Z slot update_slots: id 0 | task 136 | slot decode token, n_ctx = 4096, n_past = 43, n_cache_tokens = 43, truncated = 0
2025-02-14T23:00:02.800705467Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.811484797Z Grammar still awaiting trigger after token 315 (` of`) (buffer: `Vancouver is a major city located in the province of`)
2025-02-14T23:00:02.811530254Z slot process_toke: id 0 | task 136 | n_decoded = 11, n_remaining = -1, next token: 315 ' of'
2025-02-14T23:00:02.811539506Z srv update_slots: run slots completed
2025-02-14T23:00:02.811543520Z que start_loop: waiting for new tasks
2025-02-14T23:00:02.811546751Z que start_loop: processing new tasks
2025-02-14T23:00:02.811550199Z que start_loop: processing task, id = 147
2025-02-14T23:00:02.811553214Z que start_loop: update slots
2025-02-14T23:00:02.811556507Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.811559502Z que post: new task, id = 148, front = 0
2025-02-14T23:00:02.811562558Z slot update_slots: id 0 | task 136 | slot decode token, n_ctx = 4096, n_past = 44, n_cache_tokens = 44, truncated = 0
2025-02-14T23:00:02.811565636Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.821846980Z Grammar still awaiting trigger after token 7855 (` British`) (buffer: `Vancouver is a major city located in the province of British`)
2025-02-14T23:00:02.821900011Z slot process_toke: id 0 | task 136 | n_decoded = 12, n_remaining = -1, next token: 7855 ' British'
2025-02-14T23:00:02.821908646Z srv update_slots: run slots completed
2025-02-14T23:00:02.821911939Z que start_loop: waiting for new tasks
2025-02-14T23:00:02.821914563Z que start_loop: processing new tasks
2025-02-14T23:00:02.821917023Z que start_loop: processing task, id = 148
2025-02-14T23:00:02.821919472Z que start_loop: update slots
2025-02-14T23:00:02.821921921Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.821924329Z que post: new task, id = 149, front = 0
2025-02-14T23:00:02.821926799Z slot update_slots: id 0 | task 136 | slot decode token, n_ctx = 4096, n_past = 45, n_cache_tokens = 45, truncated = 0
2025-02-14T23:00:02.821929547Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.832880250Z Grammar still awaiting trigger after token 18796 (` Columbia`) (buffer: `Vancouver is a major city located in the province of British Columbia`)
2025-02-14T23:00:02.832923000Z slot process_toke: id 0 | task 136 | n_decoded = 13, n_remaining = -1, next token: 18796 ' Columbia'
2025-02-14T23:00:02.832930966Z srv update_slots: run slots completed
2025-02-14T23:00:02.832934269Z que start_loop: waiting for new tasks
2025-02-14T23:00:02.832938097Z que start_loop: processing new tasks
2025-02-14T23:00:02.832942451Z que start_loop: processing task, id = 149
2025-02-14T23:00:02.832946382Z que start_loop: update slots
2025-02-14T23:00:02.832950468Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.832954131Z que post: new task, id = 150, front = 0
2025-02-14T23:00:02.832957990Z slot update_slots: id 0 | task 136 | slot decode token, n_ctx = 4096, n_past = 46, n_cache_tokens = 46, truncated = 0
2025-02-14T23:00:02.832963043Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.843609184Z Grammar still awaiting trigger after token 11 (`,`) (buffer: `Vancouver is a major city located in the province of British Columbia,`)
2025-02-14T23:00:02.843651265Z slot process_toke: id 0 | task 136 | n_decoded = 14, n_remaining = -1, next token: 11 ','
2025-02-14T23:00:02.843658932Z srv update_slots: run slots completed
2025-02-14T23:00:02.843662513Z que start_loop: waiting for new tasks
2025-02-14T23:00:02.843665086Z que start_loop: processing new tasks
2025-02-14T23:00:02.843667422Z que start_loop: processing task, id = 150
2025-02-14T23:00:02.843681367Z que start_loop: update slots
2025-02-14T23:00:02.843687233Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.843689713Z que post: new task, id = 151, front = 0
2025-02-14T23:00:02.843692317Z slot update_slots: id 0 | task 136 | slot decode token, n_ctx = 4096, n_past = 47, n_cache_tokens = 47, truncated = 0
2025-02-14T23:00:02.843704388Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.854534949Z Grammar still awaiting trigger after token 389 (` on`) (buffer: `Vancouver is a major city located in the province of British Columbia, on`)
2025-02-14T23:00:02.854574231Z slot process_toke: id 0 | task 136 | n_decoded = 15, n_remaining = -1, next token: 389 ' on'
2025-02-14T23:00:02.854582773Z srv update_slots: run slots completed
2025-02-14T23:00:02.854586282Z que start_loop: waiting for new tasks
2025-02-14T23:00:02.854588886Z que start_loop: processing new tasks
2025-02-14T23:00:02.854591417Z que start_loop: processing task, id = 151
2025-02-14T23:00:02.854593928Z que start_loop: update slots
2025-02-14T23:00:02.854596563Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.854599084Z que post: new task, id = 152, front = 0
2025-02-14T23:00:02.854601626Z slot update_slots: id 0 | task 136 | slot decode token, n_ctx = 4096, n_past = 48, n_cache_tokens = 48, truncated = 0
2025-02-14T23:00:02.854604178Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.865522453Z Grammar still awaiting trigger after token 279 (` the`) (buffer: `Vancouver is a major city located in the province of British Columbia, on the`)
2025-02-14T23:00:02.865561519Z slot process_toke: id 0 | task 136 | n_decoded = 16, n_remaining = -1, next token: 279 ' the'
2025-02-14T23:00:02.865569000Z srv update_slots: run slots completed
2025-02-14T23:00:02.865572376Z que start_loop: waiting for new tasks
2025-02-14T23:00:02.865574908Z que start_loop: processing new tasks
2025-02-14T23:00:02.865577305Z que start_loop: processing task, id = 152
2025-02-14T23:00:02.865580166Z que start_loop: update slots
2025-02-14T23:00:02.865582780Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.865585178Z que post: new task, id = 153, front = 0
2025-02-14T23:00:02.865587607Z slot update_slots: id 0 | task 136 | slot decode token, n_ctx = 4096, n_past = 49, n_cache_tokens = 49, truncated = 0
2025-02-14T23:00:02.865590077Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.876130230Z Grammar still awaiting trigger after token 9710 (` west`) (buffer: `Vancouver is a major city located in the province of British Columbia, on the west`)
2025-02-14T23:00:02.876167752Z slot process_toke: id 0 | task 136 | n_decoded = 17, n_remaining = -1, next token: 9710 ' west'
2025-02-14T23:00:02.876175501Z srv update_slots: run slots completed
2025-02-14T23:00:02.876178877Z que start_loop: waiting for new tasks
2025-02-14T23:00:02.876181552Z que start_loop: processing new tasks
2025-02-14T23:00:02.876184125Z que start_loop: processing task, id = 153
2025-02-14T23:00:02.876186842Z que start_loop: update slots
2025-02-14T23:00:02.876201476Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.876204255Z que post: new task, id = 154, front = 0
2025-02-14T23:00:02.876208968Z slot update_slots: id 0 | task 136 | slot decode token, n_ctx = 4096, n_past = 50, n_cache_tokens = 50, truncated = 0
2025-02-14T23:00:02.876211654Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.886711308Z Grammar still awaiting trigger after token 13648 (` coast`) (buffer: `Vancouver is a major city located in the province of British Columbia, on the west coast`)
2025-02-14T23:00:02.886750467Z slot process_toke: id 0 | task 136 | n_decoded = 18, n_remaining = -1, next token: 13648 ' coast'
2025-02-14T23:00:02.886758216Z srv update_slots: run slots completed
2025-02-14T23:00:02.886761756Z que start_loop: waiting for new tasks
2025-02-14T23:00:02.886764391Z que start_loop: processing new tasks
2025-02-14T23:00:02.886766820Z que start_loop: processing task, id = 154
2025-02-14T23:00:02.886769197Z que start_loop: update slots
2025-02-14T23:00:02.886771636Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.886773972Z que post: new task, id = 155, front = 0
2025-02-14T23:00:02.886776812Z slot update_slots: id 0 | task 136 | slot decode token, n_ctx = 4096, n_past = 51, n_cache_tokens = 51, truncated = 0
2025-02-14T23:00:02.886779313Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.897015634Z Grammar still awaiting trigger after token 315 (` of`) (buffer: `Vancouver is a major city located in the province of British Columbia, on the west coast of`)
2025-02-14T23:00:02.897051047Z slot process_toke: id 0 | task 136 | n_decoded = 19, n_remaining = -1, next token: 315 ' of'
2025-02-14T23:00:02.897058189Z srv update_slots: run slots completed
2025-02-14T23:00:02.897061534Z que start_loop: waiting for new tasks
2025-02-14T23:00:02.897063952Z que start_loop: processing new tasks
2025-02-14T23:00:02.897066206Z que start_loop: processing task, id = 155
2025-02-14T23:00:02.897068511Z que start_loop: update slots
2025-02-14T23:00:02.897071660Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.897073976Z que post: new task, id = 156, front = 0
2025-02-14T23:00:02.897076322Z slot update_slots: id 0 | task 136 | slot decode token, n_ctx = 4096, n_past = 52, n_cache_tokens = 52, truncated = 0
2025-02-14T23:00:02.897078720Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.907501791Z Grammar still awaiting trigger after token 6864 (` Canada`) (buffer: `Vancouver is a major city located in the province of British Columbia, on the west coast of Canada`)
2025-02-14T23:00:02.907573552Z slot process_toke: id 0 | task 136 | n_decoded = 20, n_remaining = -1, next token: 6864 ' Canada'
2025-02-14T23:00:02.907582495Z srv update_slots: run slots completed
2025-02-14T23:00:02.907601194Z que start_loop: waiting for new tasks
2025-02-14T23:00:02.907605084Z que start_loop: processing new tasks
2025-02-14T23:00:02.907608480Z que start_loop: processing task, id = 156
2025-02-14T23:00:02.907611465Z que start_loop: update slots
2025-02-14T23:00:02.907614357Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.907617362Z que post: new task, id = 157, front = 0
2025-02-14T23:00:02.907620326Z slot update_slots: id 0 | task 136 | slot decode token, n_ctx = 4096, n_past = 53, n_cache_tokens = 53, truncated = 0
2025-02-14T23:00:02.907623331Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.919535304Z Grammar still awaiting trigger after token 13 (`.`) (buffer: `Vancouver is a major city located in the province of British Columbia, on the west coast of Canada.`)
2025-02-14T23:00:02.919576665Z slot process_toke: id 0 | task 136 | n_decoded = 21, n_remaining = -1, next token: 13 '.'
2025-02-14T23:00:02.919584034Z srv update_slots: run slots completed
2025-02-14T23:00:02.919587193Z que start_loop: waiting for new tasks
2025-02-14T23:00:02.919589550Z que start_loop: processing new tasks
2025-02-14T23:00:02.919591937Z que start_loop: processing task, id = 157
2025-02-14T23:00:02.919594335Z que start_loop: update slots
2025-02-14T23:00:02.919596795Z srv update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.919599028Z que post: new task, id = 158, front = 0
2025-02-14T23:00:02.919601302Z slot update_slots: id 0 | task 136 | slot decode token, n_ctx = 4096, n_past = 54, n_cache_tokens = 54, truncated = 0
2025-02-14T23:00:02.919603659Z srv update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.931837424Z Grammar still awaiting trigger after token 151645 (`<|im_end|>`) (buffer: `Vancouver is a major city located in the province of British Columbia, on the west coast of Canada.<|im_end|>`)
2025-02-14T23:00:02.931880092Z slot process_toke: id 0 | task 136 | stopped by EOS
2025-02-14T23:00:02.931888181Z slot process_toke: id 0 | task 136 | n_decoded = 22, n_remaining = -1, next token: 151645 ''
2025-02-14T23:00:02.931891649Z slot release: id 0 | task 136 | stop processing: n_past = 54, truncated = 0
2025-02-14T23:00:02.931894325Z slot print_timing: id 0 | task 136 |
2025-02-14T23:00:02.931896733Z prompt eval time = 13.15 ms / 1 tokens ( 13.15 ms per token, 76.06 tokens per second)
2025-02-14T23:00:02.931899254Z eval time = 234.89 ms / 22 tokens ( 10.68 ms per token, 93.66 tokens per second)
2025-02-14T23:00:02.931901714Z total time = 248.04 ms / 23 tokens
2025-02-14T23:00:02.931904174Z srv send: sending result for task id = 136
2025-02-14T23:00:02.931906561Z srv send: task id = 136 pushed to result queue
2025-02-14T23:00:02.931918468Z srv update_slots: run slots completed
2025-02-14T23:00:02.931921566Z que start_loop: waiting for new tasks
2025-02-14T23:00:02.931924180Z que start_loop: processing new tasks
2025-02-14T23:00:02.931926609Z que start_loop: processing task, id = 158
2025-02-14T23:00:02.931929038Z que start_loop: update slots
2025-02-14T23:00:02.931931425Z srv update_slots: all slots are idle
2025-02-14T23:00:02.931933772Z que start_loop: waiting for new tasks
2025-02-14T23:00:02.931960014Z srv remove_waiti: remove task 136 from waiting list. current waiting = 1 (before remove)
2025-02-14T23:00:02.932242069Z request: POST /v1/chat/completions 172.17.0.1 200
2025-02-14T23:00:02.932315961Z }
2025-02-14T23:00:02.932320973Z response: {"choices":[{"finish_reason":"stop","index":0,"message":{"content":"Vancouver is a major city located in the province of British Columbia, on the west coast of Canada.","tool_calls":null,"role":"assistant"}}],"created":1739574002,"model":"gpt-3.5-turbo","system_fingerprint":"b4603-4a2b196d","object":"chat.completion","usage":{"completion_tokens":22,"prompt_tokens":33,"total_tokens":55},"id":"chatcmpl-c2GN65ce1SggQmg8P20BxtaWGPQqwVDu","__verbose":{"index":0,"content":"Vancouver is a major city located in the province of British Columbia, on the west coast of Canada.","tokens":[],"id_slot":0,"stop":true,"model":"gpt-3.5-turbo","tokens_predicted":22,"tokens_evaluated":33,"generation_settings":{"n_predict":-1,"seed":4294967295,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":4096,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":-1,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"root ::= \"<tool_call>\" space tool-call \"</tool_call>\" space\nspace ::= | \" \" | \"\\n\" [ \\t]{0,20}\ntool-call ::= \n","grammar_trigger_tokens":[151657,151658],"samplers":["penalties","dry","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":5,"speculative.p_min":0.8999999761581421,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhere is Vancouver?<|im_end|>\n<|im_start|>assistant\n","has_new_line":false,"truncated":false,"stop_type":"eos","stopping_word":"","tokens_cached":54,"timings":{"prompt_n":1,"prompt_ms":13.148,"prompt_per_token_ms":13.148,"prompt_per_second":76.05719501064802,"predicted_n":22,"predicted_ms":234.893,"predicted_per_token_ms":10.676954545454546,"predicted_per_second":93.65966631615245}},"timings":{"prompt_n":1,"prompt_ms":13.148,"prompt_per_token_ms":13.148,"prompt_per_second":76.05719501064802,"predicted_n":22,"predicted_ms":234.893,"predicted_per_token_ms":10.676954545454546,"predicted_per_second":93.65966631615245}}
I think the problem might be without tools, without grammar, with jinja template, the grammar is not set correctly?
I'm running llama.cpp server using docker, how should I fetch the template using the command you mentioned in the docker container? ./scripts/get_chat_template.py google/gemma-2-2b-it > gemma2.jinja
@henryclw Thanks for the extra repro details! I was able to reproduce this ~~when building with -DLLAMA_LLGUIDANCE=1~~.
Looks like I left a typo in chat.cpp / common_chat_params_init_without_tools, will send a fix (edit: https://github.com/ggerganov/llama.cpp/pull/11880 )
Thank you for the quick reply. I just compiled your fix branch locally and it solved the problem.
Hey @MoonRide303 , @henryclw , thanks for reporting this! Are you both experiencing this on Windows?
Could you try fetching the template with
./scripts/get_chat_template.py google/gemma-2-2b-it > gemma2.jinja? (or probably with something likepy script\get_chat_template.py google/gemma-2-2b-it > gemma2.jinjaif not running inside a WSL shell)(these templates seem to work on my mac, maybe some line ending issue or bad unescaping of the JSON string if editing them manually?)
@ochafik This script doesn't work for me:
python scripts\get_chat_template.py google/gemma-2-2b-it > gemma2.jinja
Traceback (most recent call last):
File "D:\repos-git\llama.cpp\scripts\get_chat_template.py", line 76, in <module>
main(sys.argv[1:])
File "D:\repos-git\llama.cpp\scripts\get_chat_template.py", line 71, in main
template = get_chat_template(model_id, variant)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\repos-git\llama.cpp\scripts\get_chat_template.py", line 25, in get_chat_template
config_str = f.read()
^^^^^^^^
File "D:\anaconda3\Lib\encodings\cp1250.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 24973: character maps to <undefined>
I just directly copied content of chat_template field from the tokenizer_config.json files I've linked in the first post - attaching it here (as .txt, as GitHub blocks .jinja).
I just directly copied content of chat_template field from the tokenizer_config.json files I've linked in the first post - attaching it here (as .txt, as GitHub blocks .jinja).
@henryclw That content is JSON-escaped / not valid Jinja; to use it you can paste the chat_template string to a JavaScript console and wrap it with a console.log call (then copy the result to your jinja file), or try and fix the get_chat_template.py script (looks like encoding might be required on Windows). Wasn't able to test on Windows today, could you confirm if the following edit works for you?
with open(hf_hub_download(repo_id=model_id, filename="tokenizer_config.json"), "r", encoding="utf-8") as f:
config_str = f.read()
@ochafik It was me who attached those files. And... you're absolutely right it was JSON escaping causing all the troubles here. I've made simpler and working version of the script for acquiring chat templates (as an alternative for broken scripts/get_chat_template.py - maybe it should be added to the repo scripts?), and with proper JSON decoding it seems official templates are working, now (attaching correct versions of those).
get_hf_template.py.txt gemma-2-2b-it.jinja.txt Llama-3.2-3B-Instruct.jinja.txt Qwen2.5-1.5B-Instruct.jinja.txt
Could you add some kind of error when template is not a valid Jinja? It would be easier to avoid that kind of mistakes in future, then.
Could you add some kind of error when template is not a valid Jinja? It would be easier to avoid that kind of mistakes in future, then.
@MoonRide303 it should already print quite a lengthy error message (if you scroll right you'll see the ^ points at the first offending character, an \ escape), doesn't it show for you?
common_chat_templates_from_model: failed to parse chat template: Expected value expression at row 1, column 269:
{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n
^
@ochafik When I try to launch it with that earlier (broken) llama3.2.jinja it just silently quits after printing device info:
PS E:\ML-models\Llama-3.2-3B-Instruct-GGUF> E:\llama.cpp-b4734\llama-server.exe -v -ngl 99 -m Llama-3.2-3B-Instruct-Q8_0.gguf --jinja --chat-template-file llama3.2.jinja -c 8192
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4080, compute capability 8.9, VMM: yes
PS E:\ML-models\Llama-3.2-3B-Instruct-GGUF>
Same output from both my local build, and the official binaries (llama-b4734-bin-win-cuda-cu12.4-x64.zip).
(hopefully all fixed by https://github.com/ggml-org/llama.cpp/pull/11907, please lemme know if not)