eliza Feature request: Allow actions to override the agent answer

Is your feature request related to a problem? Please describe.

Right now actions are mostly seen as "side effect" happening next to an answer sent by the agent. There is no automated way to get the agent to call an action, interpret the result, and answer with that interpretation (which is what I would expect coming from OpenAI actions that are APIs that their runtime call and then get the result interpreted by the agent)

Right now if I ask my agent: what actions can you perform right now in the game? 1 - Agent response: { "user": "Miaoulechat", "text": "I can play, nap, or explore! What do you want to see?", "action": "LIST_GAME_ACTIONS" } 2 -Runtime processAction("LIST_GAME_ACTIONS"): { text: "I can play dice, roulette, blackjack" } 3 - Shown to user:

 ["◎ Agent: I can play, nap, or explore! What do you want to see?"] 
 ["◎ Agent: I can play dice, roulette, blackjack"]

Describe the solution you'd like

There are two things that could be done. The simple one would just have the action result override the answer from the agent (so in the previous example, the first text I can play, nap, or explore! What do you want to see? wouldn't even be sent to the client).

The complicated one, would have do a second back-and forth with the agent, after getting the action result, in order to interpret that result.

A) simple:

1 - Agent response: { "user": "Miaoulechat", "text": "I can play, nap, or explore! What do you want to see?", "action": "LIST_GAME_ACTIONS" } 2 -Runtime processAction("LIST_GAME_ACTIONS"): { text: "I can play dice, roulette, blackjack", type: "OVERRIDE_ANSWER" } 3 - Shown to user:

 ["◎ Agent: I can play dice, roulette, blackjack"]

here, simply adding a "type" (or any other field name) with value OVERRIDE_ANSWER to the action result, would tell the system to not send the agent answer, only the action result, to the client

B) complicated:

1 - Agent response: { "user": "Miaoulechat", "text": "I can play, nap, or explore! What do you want to see?", "action": "LIST_GAME_ACTIONS" } 2 -Runtime processAction("LIST_GAME_ACTIONS"): { data: ['dice', 'roulette', 'blackjack'], type: "ACTION_RESULT_INTERPRETATION" } 3 - Agent request: { prompt: "You were just asked 'what actions can you perform right now in the game?' which triggered the action 'LIST_GAME_ACTIONS' and the result was ['dice', 'roulette', 'blackjack']. Can you compose an answer knowing the question and the result?" } 4- Agent response: { "user": "Miaoulechat", "text": "I can play dice, roulette and blackjack"} 5 - Shown to user:

 ["◎ Agent: I can play dice, roulette and blackjack"]

here, there is a second message sent to the agent, telling it "this is what happened, this is the result, can you interpret it?". This would be done again through some "type" (ACTION_RESULT_INTERPRETATION) on the action result, which would be some kind of "operation" to be done by the runtime.

Describe alternatives you've considered

Right now as an alternative, I go with the "simple" solution, and at the client level, if there is any message with a type: "OVERRIDE_ANSWER" I remove all other messages from the returned messages.

But this forces me to do a fork of all the clients to add that filter, when it could be done earlier in the runtime / pipeline.

Additional context

I'm comparing "actions" to OpenAI actions where the agent detects that an action is possible from the prompt and a list of actions that are submitted (it also detects parameters) Then the agent call the API with those parameters The API answers, and the agent interpret the result, create a constructed answer, and returns that answer.

It would be nice to be able to have a similar behavior for actions here

Dec 03 '24 10:12 dievardump

you could solve this at a more fundamental level by splitting agents into worker vs character types

worker agents would be all business - they'd just execute actions and return results directly, perfect for stuff like LIST_GAME_ACTIONS. no personality, no fluff, just data.

character agents would be the chatty ones that interpret results and add narrative flair.

you could even take a temperature-based approach to balance between these - low temps giving you more worker-like behavior with just the facts, while higher temps let the agent be more expressive and narrative in their responses.

this split would naturally handle both your scenarios:

simple case: worker agents automatically override their response with action results
complex case: character agents process action results and craft narrative responses

but if you want to move forward with your override/interpretation types approach first, that's totally workable too. you could handle it at the runtime level to avoid client mods.

the worker/character abstraction is there when you're ready to tackle the bigger architectural picture - it would make all of this stuff cleaner and more consistent in the long run

Dec 03 '24 10:12 oguzserdar

I do understand how I can go around this, but my goal here is to be able to have this feature integrated to my agents / characters, so they can automatically interpret actions results.

That's why I would like to make it a feature, not find a way to go around the fact it's not available.

Dec 03 '24 13:12 dievardump

I support this petition

Dec 08 '24 21:12 cristianrohr

This issue has been automatically marked as stale due to 30 days of inactivity. If no further activity occurs within 7 days, it will be closed automatically. Please take action if this issue is still relevant.

Jan 08 '25 18:01 github-actions[bot]