Charro Gruver
Charro Gruver
Here's some logs that I captured earlier - Working example - No GPU ``` request: {"messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"hello"}],"stream":true,"cache_prompt":true,"samplers":"edkypmxt","temperature":0.8,"dynatemp_range":0,"dynatemp_exponent":1,"top_k":40,"top_p":0.95,"min_p":0.05,"typical_p":1,"xtc_probability":0,"xtc_threshold":0.1,"repeat_last_n":64,"repeat_penalty":1,"presence_penalty":0,"frequency_penalty":0,"dry_multiplier":0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":-1,"max_tokens":-1,"timings_per_token":false} srv params_from_: Grammar: srv params_from_: Grammar lazy: false srv params_from_:...
In the working example, you can see the conversation logged correctly. In the broken example, the prompt, "hello" is not logged.
Working - ``` slot process_toke: id 0 | task 0 | n_decoded = 1, n_remaining = -1, next token: 8279 'Hello' srv update_slots: run slots completed que start_loop: waiting for...
Broken - ``` slot process_toke: id 0 | task 0 | n_decoded = 1, n_remaining = -1, next token: 203 ' ' srv update_slots: run slots completed que start_loop: waiting...
Yes, from Ollama: https://ollama.com/library/granite3.1-moe granite3.1-moe:3b
FWIW, the granite3.2 models seem to work fine, so I'm going to close this. We can reopen if someone else has a similar issue.