Support Multi<ChatCompletionResponse> AI services
When declaring an AI service with signature:
public interface AiService {
@SystemMessage("You are a professional poet")
@UserMessage("""
Write a poem about {topic}. The poem should be {lines} lines long.
""")
Multi<ChatCompletionResponse> writeAPoem(String topic, int lines);
}
I get
dev.langchain4j.exception.IllegalConfigurationException: Only Multi<String> is supported as a Multi return type. Offending method is 'fooAiService#writeAPoem'
at dev.langchain4j.exception.IllegalConfigurationException.illegalConfiguration(IllegalConfigurationException.java:12)
at io.quarkiverse.langchain4j.deployment.AiServicesProcessor.handleDeclarativeServices(AiServicesProcessor.java:442)
To be able to access the metadata of the response, such as usage or finishReason it would be necessary to access the underlying response objects.
We can probably add support for that.
To be able to access the metadata of the response, such as usage or finishReason it would be necessary to access the underlying response objects.
How do you plan to use these in a streaming fashion (as Multi implies)?
@geoand you would check on the finish_reason of the last event before the [DONE] event.
Here is an example of a raw api call (with max_tokens set to 1):
data: {"choices":[{"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}},"delta":{"content":"It"},"finish_reason":null,"index":0,"logprobs":null}],"created":1724759298,"id":"chatcmpl-A0oyIz0Qfd9Hq22NwN3VyT7FtgCzb","model":"gpt-4o-2024-05-13","object":"chat.completion.chunk","system_fingerprint":"fp_abc28019ad"}
data: {"choices":[{"content_filter_results":{},"delta":{},"finish_reason":"length","index":0,"logprobs":null}],"created":1724759298,"id":"chatcmpl-A0oyIz0Qfd9Hq22NwN3VyT7FtgCzb","model":"gpt-4o-2024-05-13","object":"chat.completion.chunk","system_fingerprint":"fp_abc28019ad"}
data: [DONE]
As you can see, the last event shows "finish_reason":"length", while previous events have "finish_reason":null.
Gotcha, thanks!
@jmartisk do you have any spare cycles to look into this?
My guess is that it shouldn't take more than a couple hours for someone who knows the codebase :)
@geoand I took a look at the issue, I have a PR in my forked repo, and think we should support this case otherwise there's not way of associating the metadata with the request. Also the the auditing events doesn't really work (at least the completion event) in streaming mode (it's easily fixable). Do you mind taking a look at the PR at https://github.com/tomas1885/quarkus-langchain4j/pull/1 and let me know if I should open it in this repo?
@tomas1885 thanks! Please open the PR against this repo
@geoand It works but I'm not convinced its the right path. I'll open a discussion with some question in the discussions section.
+1
@geoand How about allowing to return a multi with events for each of the actual TokenStream, I started a discussion in https://github.com/quarkiverse/quarkus-langchain4j/discussions/1652, and have branch ready, but I think there might be room for improvements.
@tomas1885 it would be a lot easier to just open a draft PR so we can discuss in context, as I am not sure I understand what you are proposing.
Thanks!
@geoand I opened a draft PR