LLPhant icon indicating copy to clipboard operation
LLPhant copied to clipboard

How to use stream to return a message even instead of waiting until everything has been processed

Open pslxx opened this issue 1 year ago • 10 comments

How to use stream to return a message even instead of waiting until everything has been processed

pslxx avatar May 07 '24 09:05 pslxx

hey @pslxx , there are dedicated method in the ChatInterface like generateStreamOfText. Does it help you?

MaximeThoonsen avatar May 07 '24 09:05 MaximeThoonsen

@MaximeThoonsen the generateStreamOfText function use guzzle request (not async request), so theoretically the method waits for the whole response from the ollama api.

Is there any way to read streamed stream from ollama response directly ?

messi89 avatar Aug 22 '24 12:08 messi89

+1 Looking to iterate through each stream chunk but stream methods return StreamInterface that doesn't allow this (https://github.com/theodo-group/LLPhant/issues/78#issuecomment-1939347314)

iztok avatar Sep 01 '24 08:09 iztok

If anyone finds this helpful:

        $streamToIterator = function (StreamInterface $stream): Generator {
            while (!$stream->eof()) {
                yield $stream->read(32); // Adjust the chunk size as needed
            }
        };
        $iteratorStream = $streamToIterator($stream);

        foreach ($iteratorStream as $chunk) {
            // chunks are not token based anymore!
        }

iztok avatar Sep 02 '24 21:09 iztok

hello @ezimuel, how are you?

It seems there is a lot of questions around streaming. Can we still do streaming with StreamInterface and LLPhant? What is the "clean/simple" working example?

@iztok the code your provided is working for you to get a stream?

MaximeThoonsen avatar Sep 04 '24 11:09 MaximeThoonsen

@iztok the code your provided is working for you to get a stream?

Yes, this returns an iterateable stream I can use the same as I used the stream from the OpenAI library. One caveat is that this stream's chunks are not tokens but strings of size 32 bytes. I'm then broadcasting these chunks over the WebSocket to my chat clients.

iztok avatar Sep 04 '24 11:09 iztok

Yes, this returns an iterateable stream I can use the same as I used the stream from the OpenAI library. One caveat is that this stream's chunks are not tokens but strings of size 32 bytes. I'm then broadcasting these chunks over the WebSocket to my chat clients.

@iztok I see how that is a caveat. Does that make any difference in your use case or does it seriously impact the end-user experience?

I am trying to understand the pitfalls I might run into while trying to implement something similar.

prykris avatar Sep 11 '24 15:09 prykris