loopgpt
loopgpt copied to clipboard
Doesn't seem to notice when it hallucinates a command?
Please check that this issue hasn't been reported before.
- [X] I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
I ran the sample for getting weather information, but without setting up the fetch_weather command, which it tried to run anyway. Ideally, the system would notice that fetch_weather failed and construct an alternate plan not using the OpenWeatherMap API, and then continue with the rest of the goals (getting dressing tips and writing them to dressing_tips.txt).
Current behaviour
Instead, the system pretended that it was successful, and said it had "not been given any new commands since the last time [it] provided an output", choosing the do_nothing action.
Steps to reproduce
Run the WeatherGPT example on the README, but comment out the code setting up GetWeather.
Possible solution
I don't see any places where the system itself is being asked whether it has completed a step of the plan. Maybe add that to the prompt, or add a "cheap model" (Auto-GPT uses ada, as I recall) to evaluate this based on the output and the plan step?
Which Operating Systems are you using?
- [ ] Linux
- [ ] macOS
- [X] Windows
Python Version
- [ ] >= v3.11
- [X] v3.10
- [ ] v3.9
- [ ] <= v3.8
LoopGPT Version
latest
Acknowledgements
- [X] My issue title is concise, descriptive, and in title casing.
- [X] I have searched the existing issues to make sure this bug has not been reported yet.
- [X] I am using the latest version of LoopGPT.
- [X] I have provided enough information for the maintainers to reproduce and diagnose the issue.
Yes, I am also encountering the same issue even with simple built-in tools in cases where I removed the tool, but they still run the command, so the output is a predictable error message, but their response can be unpredictable. Sometimes they acknowledge the error and attempt a different action, but I have recorded many cases where they hallucinate it was a success or misinterpret a response as a success. The misinterpretation happens very often if it was a command run by a delegate agent/subagent.
With the misinterpreted command responses from delegate agents spawned through the create_agent
command, there are two recurring cases that I have observed and recorded:
- The main agent creates a subagent and assigns a task, that they will follow up on later, then later requests for a report by using the
mesage_agent
command but they send the request to an invalid agent ID, and misinterpret the 'agent not found' error as a failure, or sometimes bizarrely as a success. - They create a subagent, assign a task then immediately get a response from the subagent to clarify their request, which the main agent then interprets as the end result of the subagent's task.
I have noticed that the agent will sometimes misinterpret their own commands too. I think the issue to these problems lie somewhere in the following code blocks: https://github.com/farizrahman4u/loopgpt/blob/main/loopgpt/agent.py#L154-L181 https://github.com/farizrahman4u/loopgpt/blob/main/loopgpt/agent.py#L287-L340