client feat: stream responses support

Brings stream support to return partial progress with server-sent events.

⚠️ requests with best_of option can't be streamed.

You can think of it as the progress of AI typing.

The main advantage of using streams is that you can return a response to the user much earlier. See examples above to compare the result.

Request completion model

https://user-images.githubusercontent.com/5820718/208759519-aa37f144-b7a4-41c2-a6be-f25a07d6f8af.mp4

Stream completion model

https://user-images.githubusercontent.com/5820718/208759543-dc865c87-966c-429a-bf4e-e9d8be310116.mp4

Currently for completions model.

Usage

Pass the stream parameter and get a generator object. You can iterate over it until stream ends.

$client = OpenAI::client('YOUR_API_KEY');

$stream = $client->completions()->create([
    'model' => 'davinci',
    'prompt' => 'PHP is',
    'stream' => true,
]);

$fullText = '';

foreach ($stream as $item) {
    $fullText += $item['choices'][0]['text'];
}

Each iteration of stream read returns the same object as non-streamed responses returns, except usage and finishReason parameters, which are not present in partial streamed responses.

Stream text results to the client with Laravel response:

response()->stream(
    function () use ($stream) {
        foreach ($stream as $item) {
            echo $item['choices'][0]['text'];
            ob_flush();
            flush();
        }
    },
    200,
    [
        'X-Accel-Buffering' => 'no',
    ]
);

You can create your own event stream (server-sent events) to send partially data to the client.

Example with Laravel response:

response()->stream(
    function () use ($stream) {
        foreach ($stream as $item) {
            echo 'data: ' . json_encode($item) . PHP_EOL . PHP_EOL;
            ob_flush();
            flush();
        }
    },
    200,
    [
        'Content-Type' => 'text/event-stream',
        'Connection' => 'keep-alive',
        'Cache-Control' => 'no-cache',
        'X-Accel-Buffering' => 'no',
    ]
);

The current draft tries to create an implementation with minimal API changes.

completions model now returns array|Generator
usage parameter marked as optional in CreateResponse
finishReason changed to nullable
HttpTransporter client dependency changed to GuzzleHttp\ClientInterface

I would be happy for discussion and bring this PR to a stable version.

Dec 21 '22 19:12 slavarazum

In the new commit, replaced custom Stream object with Generator to stay more native.

You can iterate directly through returned generator object:

- foreach ($stream->read() as $item) {
+ foreach ($stream as $item) {
    $fullText += $item['choices'][0]['text'];
}

Dec 22 '22 01:12 slavarazum

@nunomaduro @gehrisandro Any thoughts or suggestions? 💭

Dec 28 '22 13:12 slavarazum

@slavarazum Currently a little bit busy - I will check this as soon as possible.

Dec 29 '22 14:12 nunomaduro

@slavarazum There is any reason why this makes pull request does not make the client to use stream always?

Jan 05 '23 01:01 nunomaduro

@nunomaduro The main reason is that the stream parameter is optional and false by default in OpenAI API. Stream response requires to handle iterable type. It might be confusing if we set it to true by default.

However, in my opinion, stream responses absolutely necessary to provide better user experience if response expects as soon as possible on the client side.

Jan 08 '23 04:01 slavarazum

How to make it work now? Is it possible without changes to core code?

Jan 08 '23 13:01 ijjimem

@ijjimem Let's wait for the implementation of this PR. Continued working on it.

Jan 09 '23 12:01 slavarazum

Hi @slavarazum

Thank you very much for your work so far. And sorry, for my delayed response.

I had a look into your implementation and I can see some good starting points, but nevertheless I would like to make a step back and talk first about the use cases and how the usage should look like.

Mainly I can see two use cases or goals to be achieved:

Return as quick as possible the completion retrieved so far. For example as a stream response in Laravel: return response()->stream(...)
Do something with the completion retrieved so far and when done fetch again the completion retrieved so far (not sure if this one is clear, maybe the code example below helps to clarify).

To achieve this two different use cases the user needs a way to use the response in different ways. Therefore I think it's better to have a dedicated CreateResponseStream class which should be separated from the CreateResponse.

Example for use case 1:

$stream = $client->completions()->createStreamed([
    'model' => 'text-davinci-003',
    'prompt' => 'PHP is ',
    'max_tokens' => 100,
]); // CreateStreamResponse

return response()->stream(function () use ($stream) {
    foreach($stream->iterator() as $newPart){
        echo $newPart; // CompletionPartial
    }
});

Example for use case 2:

$stream = $client->completions()->createStreamed([
    'model' => 'text-davinci-003',
    'prompt' => 'PHP is ',
    'max_tokens' => 100,
]); // CreateStreamResponse

while(!$stream->finished()){
    $response = $stream->response(); // CreateResponse object with the full completion received so far
    sleep(1); // do some work with the response
}

Some explanations how I would structure the code:

CreateStreamResponse would implement a new interface StreamResponse with three methods:

iterator() returns a class which implements a new interface StreamResponseIterator (see below)
finished() return a boolean if the stream (completion) has finished (maybe completed() would be a better name, but it's a bit weird in context of "completions")
response() a Response object (in this case a CreateResponse) with the full completion retrieved so far

StreamResponseIterator interface would extend the Iterator interface with objects from a new interface ResponsePartial

StreamResponsePartial interface for a newly received part of the stream response. In case of the completions this would be an object holding a simple string. When listing fine tune events as a stream (the only other endpoint which supports streaming so far) this would be an object holding a new FineTune event.

CompletionPartial would implement StreamResponsePartial and holds only the new part of the completion as a string (and implements __toString())

@nunomaduro and @slavarazum: Can you come up with different use cases and what do you think about having a dedicated method (createStreamed()) and response (CreateStreamResponse)?

Jan 17 '23 22:01 gehrisandro

Hi @gehrisandro 🙌

In my vision, stream option should be passed with other options to create method for 2 reasons:

Stay consistent. Developers might expect that all options from OpenAI API Docs able to pass into create method.
A stream response is not guaranteed even if we pass the stream parameter. For example when best_of option is presented too.

A method that returns the full content of a stream response might makes sense.

Let's define the final API.

create method should return a single object for both cases or separate?
to iterate through the stream should we call some method or it would be better to iterate over returned object?

Single object with isStream method and ability to iterate over it may provide more simple API:

$response = $client->completions()->create([
    // ...
]);

if ($response->isStream()) {
    foreach ($response as $part) {
        // ...
    }
}

vs

$response = $client->completions()->create([
    // ...
]);

if ($response instanceof CreateStreamResponse) {
    foreach ($response->iterator() as $part) {
        // ...
    }
}

Jan 18 '23 23:01 slavarazum

Can I have examples, of our other OpenAI API Clients (in other languages) solved this problem?

ps: @slavarazum really super sorry if this issue is taking forever to decide, but currently I am so busy that's been difficult.

Jan 18 '23 23:01 nunomaduro

@nunomaduro NP 🤝 Me too have some troubles in Ukraine with availability 😅 It's ok to not rush with this to create a truly good implementation. As I can see at first glance, currently it partially solved in other languages. So we can serve as a good example.

I have a look around libraries listed in docs and some others.

Examples:

implementation in Crystal library
example from another PHP library
example in unofficial Rust library

Other:

not supported in official Node.js library
I'm not familiar with Python, but this is what I found in the library
haven't found anything for compeletions in .NET library
opened issue in Go Library
nothing in Java Library
in TODO section of Ruby library

Jan 19 '23 00:01 slavarazum

I tried various combinations with the stream and best_of parameter, because the OpenAI documentation is not very clear about, what happens if you provide both.

As far as I was able to see in my tests, if you provide both parameters together it still returns a stream response but it waits sending the stream until the completion is done. What means it is not faster than a request without the stream option. But technically it is still a stream which contains the full completion within a single event. And in consequence it does not include the usage.

So I would still prefer to have two different methods but I understand your concern that developers probably are going to try the stream = true parameter on the normal create() method. To mitigate this issue we could throw an exception "stream is not support here, please use createStreamed() instead."

In the opposite we could throw an exception if best_of is passed to the createStreamed method as well even if it technically works, but there is no benefit and may leads to some confusion.

~~Additionally I found more combinations where the API has, at least in my opinion, a weird or unpredictable behaviour. For example if you pass n = 2 and stream = true the API returns the expected stream response but only with one completion instead of two. In other words the n parameter is completely ignored.~~ Update: I didn't test carefully enough. It actually returns two completions. Sorry about that.

@slavarazum Do you still think that having a single method is more convenient? Personally I do not like the necessary if statements to determine how to handle the response. Furthermore in most use cases developers will not reach for the stream option and therefore I think we should keep the "normal" create() method and response as simple as possible.

Jan 19 '23 22:01 gehrisandro

Looks like separate createStreamed method with appropriate exceptions might make sense.

At the moment I see more use cases for streamed responses than for conventional ones. For reasons of the longer response time, normal requests are more likely to be suitable for background operations when the user is not waiting for a response as soon as possible. Perhaps I'm missing something, since the industry is just emerging.

Jan 24 '23 20:01 slavarazum

What is the latest on this? Wold LOVE to be able to start using this for streaming, but can't find a reasonable way of doing it anywhere ... hoping it's soon!

Feb 16 '23 06:02 jhull

@nunomaduro How do you feel about separate createStreamed method, any thoughts about implementation in general?

Feb 17 '23 00:02 slavarazum

Any update on this? Would love to see it working :-) I very like the implementation and agree on the single-method approach.

Mar 12 '23 14:03 genesiscz

Any update on this PR ? 🙏

Mar 13 '23 14:03 CaReS0107

Working on refactoring with chat endpoint support. Trying to simplify the implementation.

Mar 13 '23 23:03 slavarazum

do someone of you deal with this on Laravel with InertiaJs and Vue3? I would love to see your implementation!

Mar 21 '23 18:03 Pierquinto

oh why not support stream responses? it is so nice to user . it can tell user ai is working .if wait long time user may think ai is not working .

Mar 23 '23 14:03 oppsDayly

I also hope to support stream as soon as possible.

Mar 23 '23 14:03 huangdijia

Working on it exactly right now. Will update a PR with new Chat completions draft as soon as possible. Stay tuned 😉

@Pierquinto When it will be finished I will share an examples with client side part.

Mar 23 '23 15:03 slavarazum

Let's continue here - https://github.com/openai-php/client/pull/84

Mar 23 '23 22:03 slavarazum

@Pierquinto simple stream reading implementation with fetch:

async function askAi() {
  const response = await fetch("/ask-ai");

  const reader = response.body?.pipeThrough(new TextDecoderStream()).getReader();

  let delta = await reader.read();

  while (!delta.done) {
    // do something with the chunk

    delta = await reader.read();
  }
}

Apr 24 '23 23:04 slavarazum

client client copied to clipboard

feat: stream responses support

Request completion model

Stream completion model

Usage

client
client copied to clipboard