NanoLLM How to know when LLM is done with reply?

How to know when LLM is done with reply?

Open ShawnHymel opened this issue 8 months ago • 1 comments

In e.g. web_chat.py, you have the following callback:

    def on_llm_reply(self, text):
        """
        Update the web chat history when the latest LLM response arrives.
        """
        self.send_chat_history()

From what I can tell, this is called each time the LLM generates a token as part of a response to a prompt. How can you tell when the LLM is done generating tokens for a given prompt? Or should I set a simple timeout (e.g. "if no tokens generated in 0.5 sec, send 'done' signal").

May 27 '24 18:05 ShawnHymel

NanoLLM NanoLLM copied to clipboard

How to know when LLM is done with reply?

NanoLLM
NanoLLM copied to clipboard