quarkus-langchain4j Support Multi<ChatCompletionResponse> AI services

When declaring an AI service with signature:

public interface AiService {
    @SystemMessage("You are a professional poet")
    @UserMessage("""
            Write a poem about {topic}. The poem should be {lines} lines long.
        """)
    Multi<ChatCompletionResponse> writeAPoem(String topic, int lines);
}

I get

dev.langchain4j.exception.IllegalConfigurationException: Only Multi<String> is supported as a Multi return type. Offending method is 'fooAiService#writeAPoem'
	at dev.langchain4j.exception.IllegalConfigurationException.illegalConfiguration(IllegalConfigurationException.java:12)
	at io.quarkiverse.langchain4j.deployment.AiServicesProcessor.handleDeclarativeServices(AiServicesProcessor.java:442)

To be able to access the metadata of the response, such as usage or finishReason it would be necessary to access the underlying response objects.

Aug 27 '24 08:08 dastrobu

We can probably add support for that.

Aug 27 '24 08:08 geoand

To be able to access the metadata of the response, such as usage or finishReason it would be necessary to access the underlying response objects.

How do you plan to use these in a streaming fashion (as Multi implies)?

Aug 27 '24 10:08 geoand

@geoand you would check on the finish_reason of the last event before the [DONE] event.

Here is an example of a raw api call (with max_tokens set to 1):

data: {"choices":[{"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}},"delta":{"content":"It"},"finish_reason":null,"index":0,"logprobs":null}],"created":1724759298,"id":"chatcmpl-A0oyIz0Qfd9Hq22NwN3VyT7FtgCzb","model":"gpt-4o-2024-05-13","object":"chat.completion.chunk","system_fingerprint":"fp_abc28019ad"}

data: {"choices":[{"content_filter_results":{},"delta":{},"finish_reason":"length","index":0,"logprobs":null}],"created":1724759298,"id":"chatcmpl-A0oyIz0Qfd9Hq22NwN3VyT7FtgCzb","model":"gpt-4o-2024-05-13","object":"chat.completion.chunk","system_fingerprint":"fp_abc28019ad"}

data: [DONE]

As you can see, the last event shows "finish_reason":"length", while previous events have "finish_reason":null.

Aug 27 '24 11:08 dastrobu

Gotcha, thanks!

Aug 28 '24 07:08 geoand

@jmartisk do you have any spare cycles to look into this?

My guess is that it shouldn't take more than a couple hours for someone who knows the codebase :)

Aug 29 '24 10:08 geoand

@geoand I took a look at the issue, I have a PR in my forked repo, and think we should support this case otherwise there's not way of associating the metadata with the request. Also the the auditing events doesn't really work (at least the completion event) in streaming mode (it's easily fixable). Do you mind taking a look at the PR at https://github.com/tomas1885/quarkus-langchain4j/pull/1 and let me know if I should open it in this repo?

Jul 27 '25 09:07 tomas1885

@tomas1885 thanks! Please open the PR against this repo

Jul 28 '25 05:07 geoand

@geoand It works but I'm not convinced its the right path. I'll open a discussion with some question in the discussions section.

Jul 28 '25 09:07 tomas1885

+1

Jul 28 '25 09:07 geoand

@geoand How about allowing to return a multi with events for each of the actual TokenStream, I started a discussion in https://github.com/quarkiverse/quarkus-langchain4j/discussions/1652, and have branch ready, but I think there might be room for improvements.

Jul 29 '25 11:07 tomas1885

@tomas1885 it would be a lot easier to just open a draft PR so we can discuss in context, as I am not sure I understand what you are proposing.

Thanks!

Jul 30 '25 06:07 geoand

@geoand I opened a draft PR

Jul 30 '25 14:07 tomas1885