woheller69 comments

Results 418 comments of


                                            woheller69

Crash, when setting top_k, top_p, or repeat_penalty

I have tried several models and do not get garbage. llama-cpp-python 0.2.74, updated yesterday.

Crash, when setting top_k, top_p, or repeat_penalty

Trying to save messages using self.llama_cpp_agent.chat_history.message_store.save_to_json("msg.txt") gives TypeError: Object of type Roles is not JSON serializable

Crash, when setting top_k, top_p, or repeat_penalty

saving messages now works but using it I find that adding a message does not work anymore. When interrupting inference manually , see #47, I am adding the partial message...

Crash, when setting top_k, top_p, or repeat_penalty

I found I can add it with self.llama_cpp_agent.chat_history.get_message_store().add_assistant_message(self.model_reply) But will it be used in follow-up conversation then?

Crash, when setting top_k, top_p, or repeat_penalty

Another thing: The ```prompt_suffix``` works nicely, but it is not stored as part of the assistants message. I think this should be the case. E.g. using "Sure thing!" as prompt_suffix...

Slow processing of follow-up prompt

It seems that the model always needs to evaluate its own previous answer as part of the prompt. In the following examples my own new prompt was quite short every...

Slow processing of follow-up prompt

maybe related to this? https://github.com/abetlen/llama-cpp-python/issues/893#issuecomment-1868070256 My guess is that the chat template differs from that used in the model response ( maybe just a \n or whatever) and it threrefore...

Slow processing of follow-up prompt

I provided fixes for the chat templates in #73. But it seems the models answer e.g. in case of GEMMA_2 contains the right number of "\n\n". Are you stripping these...

Stop LLM output on user request?

it is not about a keyword. If a long text is generated and it goes the wrong direction I want to stop it without losing the context by killing the...

Stop LLM output on user request?

I need this for a local model, just in case this makes a difference