FLAML
FLAML copied to clipboard
Agent improvement
Comments from @gagb: Some observations:
- the agent many times starts to suggest shell commands which makes the code fail. Especially as the conversation gets longer
- Sometimes the user responds with empty strings and the code agent never returns terminal and the code gets stuck in a loop. Also happens when lang=unknown eg cuz the agent didn't wrap the python code in codeblockss
- The code fails if the context size > 8k Original comment: https://github.com/microsoft/FLAML/commit/3b3dd60730931c793e88e4b2aa870fa0192db3f5#diff-9ac9829642f8aa5ad3ed717f7f60eabedf33210195465c1f6473cd2cfd4cd2af
PR microsoft/FLAML#1025
### Tasks
- [ ] https://github.com/microsoft/autogen/issues/9
@gagb The second problem should have been addressed in the latest PR. Let me know if you still have this observation.
More feedback based on integration with tinyRA and using gpt-3.5-turbo:
- Drift: The conversation may drift and start to execute code that unrelated to the goal and possibly very unsafe. We need more safety checks on the code it suggests.
- Memory refreshing: Others have found that occasionally refreshing agent memory with goal can help.
- Guaranteed structured output: Currently there are no guarantees that the coding agent will output a python code block (or even use code blocks). This can cause the conversation to fail.
- Shell agent: Currently agent can't execute shell commands to succeed (e.g., pip commands to install python packages).
@gagb The second problem should have been addressed in the latest PR. Let me know if you still have this observation.
I think I still happens with gpt-3.5. I haven't been able to test with gpt-4 because I don't have access to it. I am working on a feature to share failure cases from tinyRA easily.