client
client copied to clipboard
feat: stream responses support
Brings stream support to return partial progress with server-sent events.
⚠️ requests with
best_of
option can't be streamed.
You can think of it as the progress of AI typing.
The main advantage of using streams is that you can return a response to the user much earlier. See examples above to compare the result.
Request completion model
https://user-images.githubusercontent.com/5820718/208759519-aa37f144-b7a4-41c2-a6be-f25a07d6f8af.mp4
Stream completion model
https://user-images.githubusercontent.com/5820718/208759543-dc865c87-966c-429a-bf4e-e9d8be310116.mp4
Currently for completions
model.
Usage
Pass the stream
parameter and get a generator object.
You can iterate over it until stream ends.
$client = OpenAI::client('YOUR_API_KEY');
$stream = $client->completions()->create([
'model' => 'davinci',
'prompt' => 'PHP is',
'stream' => true,
]);
$fullText = '';
foreach ($stream as $item) {
$fullText += $item['choices'][0]['text'];
}
Each iteration of stream read returns the same object as non-streamed responses returns, except usage
and finishReason
parameters, which are not present in partial streamed responses.
Stream text results to the client with Laravel response:
response()->stream(
function () use ($stream) {
foreach ($stream as $item) {
echo $item['choices'][0]['text'];
ob_flush();
flush();
}
},
200,
[
'X-Accel-Buffering' => 'no',
]
);
You can create your own event stream (server-sent events) to send partially data to the client.
Example with Laravel response:
response()->stream(
function () use ($stream) {
foreach ($stream as $item) {
echo 'data: ' . json_encode($item) . PHP_EOL . PHP_EOL;
ob_flush();
flush();
}
},
200,
[
'Content-Type' => 'text/event-stream',
'Connection' => 'keep-alive',
'Cache-Control' => 'no-cache',
'X-Accel-Buffering' => 'no',
]
);
The current draft tries to create an implementation with minimal API changes.
- completions model now returns
array|Generator
-
usage
parameter marked as optional inCreateResponse
-
finishReason
changed to nullable -
HttpTransporter
client dependency changed toGuzzleHttp\ClientInterface
I would be happy for discussion and bring this PR to a stable version.
In the new commit, replaced custom Stream
object with Generator
to stay more native.
You can iterate directly through returned generator object:
- foreach ($stream->read() as $item) {
+ foreach ($stream as $item) {
$fullText += $item['choices'][0]['text'];
}
@nunomaduro @gehrisandro Any thoughts or suggestions? 💭
@slavarazum Currently a little bit busy - I will check this as soon as possible.
@slavarazum There is any reason why this makes pull request does not make the client to use stream always?
@nunomaduro The main reason is that the stream
parameter is optional and false by default in OpenAI API.
Stream response requires to handle iterable type.
It might be confusing if we set it to true by default.
However, in my opinion, stream responses absolutely necessary to provide better user experience if response expects as soon as possible on the client side.
How to make it work now? Is it possible without changes to core code?
@ijjimem Let's wait for the implementation of this PR. Continued working on it.
Hi @slavarazum
Thank you very much for your work so far. And sorry, for my delayed response.
I had a look into your implementation and I can see some good starting points, but nevertheless I would like to make a step back and talk first about the use cases and how the usage should look like.
Mainly I can see two use cases or goals to be achieved:
- Return as quick as possible the completion retrieved so far. For example as a stream response in Laravel:
return response()->stream(...)
- Do something with the completion retrieved so far and when done fetch again the completion retrieved so far (not sure if this one is clear, maybe the code example below helps to clarify).
To achieve this two different use cases the user needs a way to use the response in different ways. Therefore I think it's better to have a dedicated CreateResponseStream
class which should be separated from the CreateResponse
.
Example for use case 1:
$stream = $client->completions()->createStreamed([
'model' => 'text-davinci-003',
'prompt' => 'PHP is ',
'max_tokens' => 100,
]); // CreateStreamResponse
return response()->stream(function () use ($stream) {
foreach($stream->iterator() as $newPart){
echo $newPart; // CompletionPartial
}
});
Example for use case 2:
$stream = $client->completions()->createStreamed([
'model' => 'text-davinci-003',
'prompt' => 'PHP is ',
'max_tokens' => 100,
]); // CreateStreamResponse
while(!$stream->finished()){
$response = $stream->response(); // CreateResponse object with the full completion received so far
sleep(1); // do some work with the response
}
Some explanations how I would structure the code:
CreateStreamResponse
would implement a new interface StreamResponse
with three methods:
-
iterator()
returns a class which implements a new interfaceStreamResponseIterator
(see below) -
finished()
return a boolean if the stream (completion) has finished (maybecompleted()
would be a better name, but it's a bit weird in context of "completions") -
response()
aResponse
object (in this case aCreateResponse
) with the full completion retrieved so far
StreamResponseIterator
interface would extend the Iterator
interface with objects from a new interface ResponsePartial
StreamResponsePartial
interface for a newly received part of the stream response. In case of the completions this would be an object holding a simple string. When listing fine tune events as a stream (the only other endpoint which supports streaming so far) this would be an object holding a new FineTune event.
CompletionPartial
would implement StreamResponsePartial
and holds only the new part of the completion as a string (and implements __toString())
@nunomaduro and @slavarazum: Can you come up with different use cases and what do you think about having a dedicated method (createStreamed()
) and response (CreateStreamResponse
)?
Hi @gehrisandro 🙌
In my vision, stream
option should be passed with other options to create
method for 2 reasons:
- Stay consistent. Developers might expect that all options from OpenAI API Docs able to pass into
create
method. - A stream response is not guaranteed even if we pass the
stream
parameter. For example whenbest_of
option is presented too.
A method that returns the full content of a stream response might makes sense.
Let's define the final API.
- create method should return a single object for both cases or separate?
- to iterate through the stream should we call some method or it would be better to iterate over returned object?
Single object with isStream
method and ability to iterate over it may provide more simple API:
$response = $client->completions()->create([
// ...
]);
if ($response->isStream()) {
foreach ($response as $part) {
// ...
}
}
vs
$response = $client->completions()->create([
// ...
]);
if ($response instanceof CreateStreamResponse) {
foreach ($response->iterator() as $part) {
// ...
}
}
Can I have examples, of our other OpenAI API Clients (in other languages) solved this problem?
ps: @slavarazum really super sorry if this issue is taking forever to decide, but currently I am so busy that's been difficult.
@nunomaduro NP 🤝 Me too have some troubles in Ukraine with availability 😅 It's ok to not rush with this to create a truly good implementation. As I can see at first glance, currently it partially solved in other languages. So we can serve as a good example.
I have a look around libraries listed in docs and some others.
Examples:
- implementation in Crystal library
- example from another PHP library
- example in unofficial Rust library
Other:
- not supported in official Node.js library
- I'm not familiar with Python, but this is what I found in the library
- haven't found anything for compeletions in .NET library
- opened issue in Go Library
- nothing in Java Library
- in TODO section of Ruby library
I tried various combinations with the stream
and best_of
parameter, because the OpenAI documentation is not very clear about, what happens if you provide both.
As far as I was able to see in my tests, if you provide both parameters together it still returns a stream response but it waits sending the stream until the completion is done. What means it is not faster than a request without the stream option.
But technically it is still a stream which contains the full completion within a single event. And in consequence it does not include the usage
.
So I would still prefer to have two different methods but I understand your concern that developers probably are going to try the stream = true
parameter on the normal create()
method. To mitigate this issue we could throw an exception "stream
is not support here, please use createStreamed()
instead."
In the opposite we could throw an exception if best_of
is passed to the createStreamed
method as well even if it technically works, but there is no benefit and may leads to some confusion.
~~Additionally I found more combinations where the API has, at least in my opinion, a weird or unpredictable behaviour. For example if you pass n = 2
and stream = true
the API returns the expected stream response but only with one completion instead of two. In other words the n
parameter is completely ignored.~~
Update: I didn't test carefully enough. It actually returns two completions. Sorry about that.
@slavarazum Do you still think that having a single method is more convenient?
Personally I do not like the necessary if
statements to determine how to handle the response.
Furthermore in most use cases developers will not reach for the stream
option and therefore I think we should keep the "normal" create()
method and response as simple as possible.
Looks like separate createStreamed
method with appropriate exceptions might make sense.
At the moment I see more use cases for streamed responses than for conventional ones. For reasons of the longer response time, normal requests are more likely to be suitable for background operations when the user is not waiting for a response as soon as possible. Perhaps I'm missing something, since the industry is just emerging.
What is the latest on this? Wold LOVE to be able to start using this for streaming, but can't find a reasonable way of doing it anywhere ... hoping it's soon!
@nunomaduro How do you feel about separate createStreamed
method, any thoughts about implementation in general?
Any update on this? Would love to see it working :-) I very like the implementation and agree on the single-method approach.
Any update on this PR ? 🙏
Working on refactoring with chat
endpoint support. Trying to simplify the implementation.
do someone of you deal with this on Laravel with InertiaJs and Vue3? I would love to see your implementation!
oh why not support stream responses? it is so nice to user . it can tell user ai is working .if wait long time user may think ai is not working .
I also hope to support stream as soon as possible.
Working on it exactly right now. Will update a PR with new Chat completions draft as soon as possible. Stay tuned 😉
@Pierquinto When it will be finished I will share an examples with client side part.
Let's continue here - https://github.com/openai-php/client/pull/84
@Pierquinto simple stream reading implementation with fetch
:
async function askAi() {
const response = await fetch("/ask-ai");
const reader = response.body?.pipeThrough(new TextDecoderStream()).getReader();
let delta = await reader.read();
while (!delta.done) {
// do something with the chunk
delta = await reader.read();
}
}