NanoLLM
NanoLLM copied to clipboard
How to know when LLM is done with reply?
In e.g. web_chat.py, you have the following callback:
def on_llm_reply(self, text):
"""
Update the web chat history when the latest LLM response arrives.
"""
self.send_chat_history()
From what I can tell, this is called each time the LLM generates a token as part of a response to a prompt. How can you tell when the LLM is done generating tokens for a given prompt? Or should I set a simple timeout (e.g. "if no tokens generated in 0.5 sec, send 'done' signal").