langchain
langchain copied to clipboard
Interrupting completion stream
Is there a way to interrupt the generation stream? It's technically possible, but I haven't found any mention in the docs.
It can be useful for user-facing frontends when a user can abort the answer of the assistant in the middle and rephrase the task.
OpenAI forum: https://community.openai.com/t/interrupting-completion-stream-in-python/30628
Ashton1998 Jun 2023 I make a simple test for @thehunmonkgroup 's solution.
I make a call to gpt-3.5-turbo model with input:
Please introduce GPT model structure as detail as possible And let the api print all the token’s. The statistic result from OpenAI usage page is (I am a new user and is not allowed to post with >media, so I only copy the result): 17 prompt + 441 completion = 568 tokens
After that, I stop the generation when the number of token received is 9, the result is: 17 prompt + 27 completion = 44 tokens
It seems there are roughly extra 10 tokens generated after I stop the generation.
Then I stop the generation when the number is 100, the result is: 17 prompt + 111 completion = 128 tokens
So I think the solution work well but with extra 10~20 tokens every time.