LocalAI
LocalAI copied to clipboard
Add support for stream + functions
Currently whenever you try to use the stream API and functions at the same time it returns a not stream reply getting the go-openai client confused about it. I would love to have support for using both. (For now I'm going to work around it avoiding functions). Also, I no't 100% sure what is the safest approach, maybe is more safe to not use functions on streams instead of returning not-streams whenever you have functions. Anyway, the real solution goes through implementing support for the combination of both.
Solution
Implement support for functions on streams
Alternatives
- Fall in the safe side for the clients and disable functions but keep using stream response for that.
- Do the whole process related to not-streamed response using the funtions, and end up returning a stream response, but all at once, instead of using "real streaming".
Additional context
I need this to use it as part of the Mattermost ai plugin. Currently we use functions and streams in combination, OpenAI API works like a charm with that.
I don't mind giving it a try, but I would love to have some guidance about why is not implemented yet and what was the problem to implement it, to not waste my time investigating something that is already known.
:warning::warning::warning::warning::warning:
Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!
but.... I can also be funny or helpful :smile_cat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.
Don't engage in conversation with me, I don't support (yet) replying!
:warning::warning::warning::warning::warning:
The issue you are facing seems to be related to the implementation of the OpenAI API client for Go. It might be a bug in the current implementation or a missing feature. I would recommend filing an issue on the OpenAI GitHub repository and providing the details of the problem you are experiencing, along with any relevant code snippets and logs. The OpenAI team can then investigate the issue and provide a solution or suggest a workaround. As for the additional context you provided, it seems like you are using the OpenAI API to integrate it into a Mattermost chatbot plugin. This is a great use case for the API, as it allows for real-time predictions and interactions with users. However, I'm not sure what specific functions you are referring to that are not working correctly. Could you please provide more details on the functions you are using and how they are supposed to work?
Sources:
- https://github.com/go-skynet/LocalAI/tree/master/api/api_test.go
- https://github.com/go-skynet/LocalAI/tree/master/pkg/grpc/proto/backend_grpc.pb.go
- https://github.com/go-skynet/LocalAI/tree/master/api/openai/chat.go
Happy for @mudler sharing more detail here.
@jespino there are two issues with streaming functions results:
- determining when there is no action to do. Currently we let the llm reply with a "no reply" function to decide to not call any function. Due to this reason we need to consume the answer and decode the JSON to appropriately return the results to the client
- the grammar output isn't currently matching 100% the OpenAI APIs and as such the JSON output from the LLM is re-marshalled - but that's quite easy to adapt and rework on.
The interested code section is https://github.com/go-skynet/LocalAI/blob/432513c3ba7c7e2491fed64dec4c4f0f545984ba/api/openai/chat.go#L294
One way would to tackle this would be to change how the llm parses the grammar and let reply directly 1:1 the OpenAI format. However I noticed that not having a name for the reply functions (such as returning null, etc) makes the LLM less consistent with the results.
@jespino however if functions with "stream": true break the client that's a bug - we should just return everything in one event so at least we should be compatible.
@mudler I going to double check that to ensure is not a problem in my implementation and is a standard go openAI client library behavior. Also, I going to verify other official client to triple check that. and then the first iteration would be to return just one single "stream" message for that. And then I can try to solve the feature of streaming+functions.
I also ran into this issue while implementing ai-assistant-vui. Using LocalAI 2.25.0, I can confirm that enabling streaming+functions kind of works with the go openAI client as well as with langchaingo, except that it is not really streaming the response and the function call JSON is returned twice, each version missing information the other has. While that can be handled, the increased latency of the chat completion response does not allow a fluent conversation with the AI assistant.😞 Is there any progress on this end by any chance?
Update: I see the problem is here now.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.