spring-ai [FEATURE] avoid second round trip for function call

Reefer issue #652

For some use case after local function is called no need to pass the results to LLMs. (eg: data extraction from documents)

Apr 29 '24 15:04 Grogdunn

@Grogdunn i'm afraid you've committed many changes unrelated to this PR. Please fix the PR or create new one that includes only the related changes.

To prevent such situations, please don't reformat the project code. Especially code not related to the PR. Also use pull rebase requests (e.g. git pull -r upstream main) to rebase to the top of main. Never merge with upstream main.

May 13 '24 09:05 tzolov

@tzolov removed commit with all import * fixed and re-played only for touched subproject. rebased on top on upstream master (as requested)

conflicts are gone! :tada:

May 13 '24 09:05 Grogdunn

Thanks @Grogdunn. Still the majority of the changes (> 100 files) in the PR are code reformatting unrelated to the PR goal. For example import reordering in AnthropicChatOptions.java, AnthropicApi.java .... I'm presuming that actual changes would affect much fewer files. Can you please undo those changes (i guess your IDE is doing them automatically) or copy only the related changes in new PR.

May 17 '24 04:05 tzolov

Sure! Today I'll try to reduce PR changeset.

May 17 '24 06:05 Grogdunn

Ok, squashed, rebased onto main, cleaned up.

May 17 '24 08:05 Grogdunn

Hi @Grogdunn , I've been reviewing the PR and doing some improvements when I realised that this approach (and likely any approach) will be incorrect.

First, I noticed that the PR doesn't support streaming function calling and while reasoning about ways to provide such support I realised that we are on a wrong path.

Basically we can not safely abruptly stop the function calling message exchange and leave the conversation in a safe state.

Simply we don't know what other actions the LLM might need to perform before safely answering the user prompt.
For example if you your prompt is " Cancels my flight and let me know what is the weather in Amsterdam" and the Function<FlightToCancel, Void> doesn't return value. The PR will abruptly stop the conversation and will not allow the LLM to call the second WeatherInfo function and complete the conversation. It gets even more complicated for streaming case.

So IMO the only way this can ever work is if the LLM Function Calling API provide a protocol to send EmptyResponse for a Tool call. Any other heuristics will be incorrect.

Please let me know what do you think.

May 19 '24 18:05 tzolov

:thinking: Well the PR was made before the other PR with function streaming come to main, so I can port the changes to function-streaming.

I think that is possible because before call local function we wait for streaming-stop signal (I don't remeber the correct terms), then the stream stops and we call the local functions. After that we have, with your example, two results: 1 - null and 2 - {"temperature": 30, "sky": "sunny", "blabla": "lorem ipsum"} So we evalutate all the local responses and complete the second roundtrip because we have the answer number 2.

I'll try to spike that today.

May 20 '24 06:05 Grogdunn

@Grogdunn , the problem is not related to the streaming but to the sync calls as well. Simply, current LLM API demands that you should always return a function call result message back to the LLM. Even for a void functions. Perhaps a way to emulate a void function call behaviour is to return an empty string or something like "DONE" in case of void functions.

If you do not return a value you break the conversation and prevent the LLM to do some follow up function calls.

May 20 '24 07:05 tzolov

Well in terms of conversation you are right. But you can use LLM to make something else instead of "chatbot". Some of my customers need to grab structured data from unstructured data, like name, surname, other information from emails, attachments, extract information from datasheet ad so on... In this case we do not have a "conversation" but a single-shot request with a lot of context that ends with the function call.

A simple prompt can be, for instance:

U: Extract name, surname, emotional state, address, city, and summarize the following text, and call the function SAVEITNOW with extracted data:
--
Lorem ipsum dolor sit amet...[a lot of text]

the function is described as usual. No other interaction with this data will be done in the future.

May 20 '24 08:05 Grogdunn

I add, in my case, the second roundtrip is useless, and make double the costs for any interaction.

May 20 '24 08:05 Grogdunn

I understand your use case and the goal to reduce the cost. But the solution is sort of a hack that breaks the LLM APIs/protocols. Therefore we will be reluctant to add such hacks to the general purpose Spring AI. Perhaps the Structured Output Converter can help to returns one-shot structured response for other use case? Other options is to think of some extension/customisation points that might allow one hack the spring ai function calling API in their applications.

At the same time i've reworked our PR to returns "DONE" in case of void function definition, which is still a valid user case and modified the test like this: https://github.com/tzolov/spring-ai/blob/8f832e337c6d4fc7a244de68cd61c5bd8e0c5876/models/spring-ai-azure-openai/src/test/java/org/springframework/ai/azure/openai/function/AzureOpenAiChatClientFunctionCallIT.java#L123

This seems to work (most of the time) for sync and stream calls.

May 20 '24 09:05 tzolov

The "done" message is exactly how we have addressed the "second round trip" at the moment.

I think, if I've understood well, the Structured Output Converter is not enough because when you extract data and call the functions the LLM better respects the data structure, while the other way around (parse the text in output and grab the JSON provided) is more prone to hallucinations (I've used some time ago before functions/tools era).

So the better way is to think an extension point to leave the hack external to te spring-ai.

May 20 '24 10:05 Grogdunn

While doing triage for the M1 release, we will move this issue to the M2 release. There is a Pr that @tzolov will submit that takes this discussion further. Thanks for your patience.

May 24 '24 14:05 markpollack

I've done some experiment with structured output, and with GPT-4o simply works. But remain hallucination prone.

Maybe that feature is not necessary anymore?

Jul 24 '24 14:07 Grogdunn

I'm afraid I have to bump this again, apologies.

Aug 20 '24 19:08 markpollack

@Grogdunn I think that https://github.com/spring-projects/spring-ai/commit/501774925c809fb47e02c73688092e46cdb78099 might help addressing this issue? Please, check the OpenAiChatModelProxyToolCallsIT.java for examples how to run the function calling entirely on the client side. Let me know if you have further questions.

Sep 23 '24 22:09 tzolov

@tzolov Thanks for the hint! I see the intent of this commit and if I understand well is what I need! I try to use as soon as possible

Sep 24 '24 07:09 Grogdunn

Going to close this issue. Please open a new one should the current feature set not work out. Thanks @Grogdunn

Oct 22 '24 21:10 markpollack

spring-ai spring-ai copied to clipboard

[FEATURE] avoid second round trip for function call

spring-ai
spring-ai copied to clipboard