OpenAI.jl icon indicating copy to clipboard operation
OpenAI.jl copied to clipboard

Added the isjsonvalid() and the wordstream() functions

Open atantos opened this issue 1 year ago • 5 comments

Hi there!

You might want to consider adding a default wordstream() function. The function parses the stream and it seems to work as is. If you think it is ok, I could also contribute to the docu of the function.

Regards.

Alex

atantos avatar Nov 18 '23 07:11 atantos

Hey Alex, thanks for the PR! I'm getting caught up after a hectic couple weeks. Will take a look by the end of today

roryl23 avatar Nov 18 '23 17:11 roryl23

Out of curiousity, what is the use case here? What would be the advantages of implementing it?

It seems that it basically parses JSON twice anyway, once inside a try-catch (in isvalidjson) and then when actually parsing. What is the advantage over just having the try-catch around the parsing itself?

Side note: perhaps you could disable your formatter to only change the proposed functionality? There are a few places picked up by the diff where only spaces have changed.

svilupp avatar Nov 19 '23 11:11 svilupp

Thank you for the insight, @svilupp! I'm thinking of a "default" streamcallback function for word-by-word streaming that would be very useful for a user for a number of reasons. Crafting such a function is not straightforward, as the data stream isn't just a series of JSON-parsable objects prefixed with "Data:". The stream may split, with a packet starting mid-way through an object from the previous packet. You're correct about the double JSON parsing; there might indeed be a more efficient method to verify if the input is JSON before continuing with the wordstream() function. Also, I appreciate your feedback on the formatter—I'll definitely keep that in mind!

atantos avatar Nov 19 '23 12:11 atantos

Thank you for the insight, @svilupp! I'm thinking of a "default" streamcallback function for word-by-word streaming that would be very useful for a user for a number of reasons. Crafting such a function is not straightforward, as the data stream isn't just a series of JSON-parsable objects prefixed with "Data:". The stream may split, with a packet starting mid-way through an object from the previous packet. You're correct about the double JSON parsing; there might indeed be a more efficient method to verify if the input is JSON before continuing with the wordstream() function. Also, I appreciate your feedback on the formatter—I'll definitely keep that in mind!

Apologies for the slow response!

Would you mind outlining some of the use cases for building JSONs on the fly as they are getting streamed? I lack imagination and I tend to associate streaming with nicer UX design for chat interfaces. Are you thinking of some other use case?

I'm not very familiar with the functionality, so I'd appreciate if you could capture a few key test cases and codify them in the test set. It will help us prevent any future regressions and it will help with onboarding new devs, because they'll see a practical example. I can imagine that parsing text vs JSON mode response vs function_call will require a slightly different approach, so I'm keen to see that covered in the test suite.

svilupp avatar Nov 27 '23 20:11 svilupp

Perhaps we should learn from what others are doing?

Langchain:

astream_events

It’s worth calling out that we’re using our new astream_eventsmethod to easily stream back all events (new tokens, as well as function calls and function results) and surface them to the user. We do some filtering of this stream to get relevant messages or message chunks, and then render them nicely in the UI. If you aren’t familiar with astream_events, it is definitely worth checking it out in more detail here.

Code: https://github.com/langchain-ai/langchain/blob/a0ec04549575de5547faa5381fa69fef61405ac8/libs/core/langchain_core/runnables/base.py#L697

svilupp avatar Feb 01 '24 13:02 svilupp