azure-sdk-for-java
azure-sdk-for-java copied to clipboard
[BUG] Streaming does not work with Spring AI and Azure OpenAI
Describe the bug When streaming a completion, the results are still all aggregated and arrive sequentially instead of "streamed".
Exception or Stack Trace No stacktrace
To Reproduce Make a streaming call with Spring AI version 1.0.0 and observe the nature of the call is wrong, as the chunks are "blocked" somewhere deep in the Azure SDK code.
Code Snippet
@ServiceMethod(returns = ReturnType.COLLECTION)
public IterableStream<ChatCompletions> getChatCompletionsStream(String deploymentOrModelName,
ChatCompletionsOptions chatCompletionsOptions) {
chatCompletionsOptions.setStream(true);
RequestOptions requestOptions = new RequestOptions();
Flux<ByteBuffer> responseStream = getChatCompletionsWithResponse(deploymentOrModelName,
BinaryData.fromObject(chatCompletionsOptions), requestOptions).getValue().toFluxByteBuffer();
OpenAIServerSentEvents<ChatCompletions> chatCompletionsStream
= new OpenAIServerSentEvents<>(responseStream, ChatCompletions.class);
return new IterableStream<>(chatCompletionsStream.getEvents());
}
Class: OpenAIClient.java
Expected behavior The response needs to be streamed instead of arriving all at once
Setup (please complete the following information):
- OS: MacOS
- IDE: IntelliJ IDEA 2024.1.2 (Ultimate Edition)
- Library/Libraries: azure-ai-openai-1.0.0-beta.8.jar
- Java version: 21
- App Server/Environment: Standard Springboot setup
- Frameworks: Spring Boot
Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report
- [ X ] Bug Description Added
- [ X ] Repro Steps Added
- [ X ] Setup information Added
Same issue here, even using the last available version (1.0.0-beta.9) inside a simple Java class.
Thanks for confirming this additionally as well! I initially thought it was an issue with the Spring AI project but after careful debugging realized that it was nested deeper into the SDK code, hopefully this can be picked up soon-ish.
PS: I did try to checkout the code myself and adapt the method from inside my fork of the Spring AI project but I got too lost in it and I couldn't devise a possible solution avenue
I have the same problem but I think is a matter of deploy configuration. Are you using personalized content filter without asynchronous streaming mode? In which region is your deployment?
@nazarenodefrancescomaize No, essentially I'm just using OpenAI API as it comes out of the box, nothing else is configured anywhere. Essentially, I use a test prompt for this:
Give me a recipe from Portugal
Then I need to see the answer being streamed if the mode is set to stream, and not streamed if not. I have no filters anywhere, region is within EU.
@nazarenodefrancescomaize No, essentially I'm just using OpenAI API as it comes out of the box, nothing else is configured anywhere. Essentially, I use a test prompt for this:
Give me a recipe from PortugalThen I need to see the answer being streamed if the mode is set to stream, and not streamed if not. I have no filters anywhere, region is within EU.
Ok thanks. So I think it is a different problem, beacuase we solved our problem and was related to a custom content filter applied on that Azure deployment without the Asynchrounous filtering enabled. In default mode in fact the filter waits the generation completion, even in streaming mode.
I also encountered the same problem
bump
I can also confirm that issue in combination with springai. It is also not related to my GPT model / azure deployment. Using the npm package (@azure/openai) the streaming works without issues.
@bruno-oliveira for a possible solution see communication in https://github.com/spring-projects/spring-ai/pull/1054 At least for me that is working..
I can also confirm this bug. Previously (A couple of months) ago this worked. Here is my service method:
public Flux<ServerSentEvent<ConversationModelDto>> generateStream(String message) {
UserMessage userMessage = new UserMessage(message);
Prompt prompt = new Prompt(List.of(userMessage));
return chatClient.stream(prompt)
.map(chatResponse -> {
String resp = chatResponse.getResult().getOutput().getContent();
ConversationModelDto conversationModelDto = ConversationModelDto.builder()
.type("bot")
.message(resp)
.sessionToken("999")
.build();
return ServerSentEvent.builder(conversationModelDto).build();
});
}
```
I am mapping to a dto before using a ServerSentEvent to stream the data to an EventSource in an Angular app. Like I said, previously worked but now it seems the JSON returned is truncated somehow and Jackson can't cope.
Here is an example of the error message:
```
2024-07-31T11:06:19.776+02:00 ERROR 9356 --- [AiExampleProject] [oundedElastic-1] c.a.c.i.MethodHandleReflectiveInvoker : Unexpected end-of-input in VALUE_STRING at [Source: (byte[])"{"choices":[{"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"sa"; line: 1, column: 174]
2024-07-31T11:06:19.777+02:00 ERROR 9356 --- [AiExampleProject] [oundedElastic-1] c.a.c.i.s.DefaultJsonSerializer : com.azure.json.implementation.jackson.core.io.JsonEOFException: Unexpected end-of-input in VALUE_STRING at [Source: (byte[])"{"choices":[{"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"sa"; line: 1, column: 174]
You can see that the JSON has been truncated. Whats interesting is that untruncated JSON continues to be returned and the truncated is skipped but this results in the output being a little nonsensical.
Here is some test output from post man, for reference the prompt said tell me a bedtime story. Oldest response is at the bottom. When this works I see "once upon a time" but here you can see the words "you" and "once upon a" are missing.
{"message":" time","type":"bot","sessiontoken":"999"}
09:59:25
{"message":" for","type":"bot","sessiontoken":"999"}
09:59:25
{"message":" bedtime","type":"bot","sessiontoken":"999"}
09:59:25
{"message":" cozy","type":"bot","sessiontoken":"999"}
09:59:25
{"message":" a","type":"bot","sessiontoken":"999"}
09:59:25
{"message":"Certainly","type":"bot","sessiontoken":"999"}
09:59:25
{"type":"bot","sessiontoken":"999"}
Hi @bruno-oliveira, the Azure OpenAI Inference SDK has been deprecated and won't receive updates in its current form. This package will be repurposed to provide Azure types to be used with the official OpenAI Java SDK in the future, but there is not much detail I can provide at this point in time, as things are still in flux. We have provided a basic migration guide for this in this folder.
Hi @bruno-oliveira. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.
Hi @bruno-oliveira, we're sending this friendly reminder because we haven't heard back from you in 7 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!