langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Interrupting completion stream

Open ksanderer opened this issue 8 months ago • 4 comments

Is there a way to interrupt the generation stream? It's technically possible, but I haven't found any mention in the docs.

It can be useful for user-facing frontends when a user can abort the answer of the assistant in the middle and rephrase the task.

OpenAI forum: https://community.openai.com/t/interrupting-completion-stream-in-python/30628

Ashton1998 Jun 2023 I make a simple test for @thehunmonkgroup 's solution.

I make a call to gpt-3.5-turbo model with input:

Please introduce GPT model structure as detail as possible And let the api print all the token’s. The statistic result from OpenAI usage page is (I am a new user and is not allowed to post with >media, so I only copy the result): 17 prompt + 441 completion = 568 tokens

After that, I stop the generation when the number of token received is 9, the result is: 17 prompt + 27 completion = 44 tokens

It seems there are roughly extra 10 tokens generated after I stop the generation.

Then I stop the generation when the number is 100, the result is: 17 prompt + 111 completion = 128 tokens

So I think the solution work well but with extra 10~20 tokens every time.

ksanderer avatar May 30 '24 11:05 ksanderer