Add cancel() method to interrupt a stream
Fixes #599.
Thanks for all your work on this project!
please accept this pr @abetlen
Actually.. I found an issue with this method.. this will only cancel after a token is generated but if the llm is slow or gets stuck processing the prompt, this doesn't cancel it..
We need a better method.
I'm coming back to this because I need to figure out a better method to interrupt the generation programmatically..
For a console-based scenario it's pretty easy in python, all I have to do is surround the code with try except KeyboardInterrupt: .. then I can just press ctrl+c at any point to gracefully interrupt the llm..
But.. if I'm using a front-end user interface, I haven't managed to make it work properly let's say with a button "Stop generating" that can call a python function.. because of the issue I mentioned in the previous post..
@abetlen sorry to bother again but do you have any suggestions/ideas on how to accomplish this?
Why not add it now and improve if there is a better solution. For now this would work in most cases.
has anyone found a reasonable solution for this? Or am I the only one not willing to wait until the model finishes without killing the job and losing context?
Any chance this gets merged for now?
It indeed blocks until the first token is produced, but cancelling it after that is trivial. The other similar issue is cancelling a model that is loading.
gpt4all python bindings offer a similar way which allows stopping with the next token