loopgpt Doesn't seem to notice when it hallucinates a command?

Doesn't seem to notice when it hallucinates a command?

Open swapneils opened this issue 1 year ago • 1 comments

Please check that this issue hasn't been reported before.

[X] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

I ran the sample for getting weather information, but without setting up the fetch_weather command, which it tried to run anyway. Ideally, the system would notice that fetch_weather failed and construct an alternate plan not using the OpenWeatherMap API, and then continue with the rest of the goals (getting dressing tips and writing them to dressing_tips.txt).

Current behaviour

Instead, the system pretended that it was successful, and said it had "not been given any new commands since the last time [it] provided an output", choosing the do_nothing action.

Steps to reproduce

Run the WeatherGPT example on the README, but comment out the code setting up GetWeather.

Possible solution

I don't see any places where the system itself is being asked whether it has completed a step of the plan. Maybe add that to the prompt, or add a "cheap model" (Auto-GPT uses ada, as I recall) to evaluate this based on the output and the plan step?

Which Operating Systems are you using?

[ ] Linux
[ ] macOS
[X] Windows

Python Version

[ ] >= v3.11
[X] v3.10
[ ] v3.9
[ ] <= v3.8

LoopGPT Version

latest

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this bug has not been reported yet.
[X] I am using the latest version of LoopGPT.
[X] I have provided enough information for the maintainers to reproduce and diagnose the issue.

May 05 '23 04:05 swapneils

Yes, I am also encountering the same issue even with simple built-in tools in cases where I removed the tool, but they still run the command, so the output is a predictable error message, but their response can be unpredictable. Sometimes they acknowledge the error and attempt a different action, but I have recorded many cases where they hallucinate it was a success or misinterpret a response as a success. The misinterpretation happens very often if it was a command run by a delegate agent/subagent.

With the misinterpreted command responses from delegate agents spawned through the create_agent command, there are two recurring cases that I have observed and recorded:

The main agent creates a subagent and assigns a task, that they will follow up on later, then later requests for a report by using the mesage_agent command but they send the request to an invalid agent ID, and misinterpret the 'agent not found' error as a failure, or sometimes bizarrely as a success.
They create a subagent, assign a task then immediately get a response from the subagent to clarify their request, which the main agent then interprets as the end result of the subagent's task.

I have noticed that the agent will sometimes misinterpret their own commands too. I think the issue to these problems lie somewhere in the following code blocks: https://github.com/farizrahman4u/loopgpt/blob/main/loopgpt/agent.py#L154-L181 https://github.com/farizrahman4u/loopgpt/blob/main/loopgpt/agent.py#L287-L340

May 06 '23 18:05 iskandarreza

loopgpt loopgpt copied to clipboard

Doesn't seem to notice when it hallucinates a command?

Please check that this issue hasn't been reported before.

Expected Behavior

Current behaviour

Steps to reproduce

Possible solution

Which Operating Systems are you using?

Python Version

LoopGPT Version

Acknowledgements

loopgpt
loopgpt copied to clipboard