autogen
autogen copied to clipboard
Graph group chat
@sonichi, please note that this is the PR continued from #753 as discussed.
@afourney, @thinkall, @JieyuZ2, @LinxinS97, @chnldw, @ragyabraham, @rytsang were the previously invited reviewers in #753
Why are these changes needed?
Following our video-conference discussion, we have decided to extend GroupChat directly to accept two parameters:
- allowed_or_disallowed_speaker_order: A dictionary depicting from/to.
- speaker_order_type: allowed/disallowed transitions
Related issue number
Closes #743 Ideas for #736 #818 #584
Checks
- [X] I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
- [X] I've added tests (if relevant) corresponding to the changes introduced in this PR.
- [X] I've made sure all auto checks have passed.
Codecov Report
Attention: 30 lines in your changes are missing coverage. Please review.
Comparison is base (
feed806) 34.26% compared to head (1e49883) 54.34%.
| Files | Patch % | Lines |
|---|---|---|
| autogen/agentchat/groupchat.py | 76.19% | 8 Missing and 7 partials :warning: |
| autogen/graph_utils.py | 72.72% | 14 Missing and 1 partial :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #857 +/- ##
===========================================
+ Coverage 34.26% 54.34% +20.08%
===========================================
Files 42 43 +1
Lines 5099 5207 +108
Branches 1165 1280 +115
===========================================
+ Hits 1747 2830 +1083
+ Misses 3209 2131 -1078
- Partials 143 246 +103
| Flag | Coverage Δ | |
|---|---|---|
| unittests | 54.31% <74.57%> (+20.04%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Please add --ignore=test/agentchat/contrib in the line pytest test of build.yml.
@thinkall line 56 of the contrib test doesn't make sense.
@kevin666aa @rickyloynd-microsoft the coverage section of compressible test and teachable test are missing in the contrib-test.yml @IANTHEREAL @gagb same for GPTAssistant test.
@kevin666aa @rickyloynd-microsoft the coverage section of compressible test and teachable test are missing in the contrib-test.yml @IANTHEREAL @gagb same for GPTAssistant test.
Got it, will add the test. The converge was not there because the whole contrib test is tested in line 56, as pointed out.
@kevin666aa @rickyloynd-microsoft the coverage section of compressible test and teachable test are missing in the contrib-test.yml @IANTHEREAL @gagb same for GPTAssistant test.
Got it, will add the test. The converge was not there because the whole contrib test is tested in line 56, as pointed out.
Thanks @kevin666a & sonichi, I just realized sonichi debugged why that 1 test has failed in https://github.com/microsoft/autogen/actions/runs/7082505113/job/19273363788?pr=857. Its because the entire contrib folder is tested in line 56.
@kevin666aa @rickyloynd-microsoft the coverage section of compressible test and teachable test are missing in the contrib-test.yml @IANTHEREAL @gagb same for GPTAssistant test.
Got it, will add the test. The converge was not there because the whole contrib test is tested in line 56, as pointed out.
Thanks @kevin666a & sonichi, I just realized sonichi debugged why that 1 test has failed in https://github.com/microsoft/autogen/actions/runs/7082505113/job/19273363788?pr=857. Its because the entire contrib folder is tested in line 56.
Hi @joshkyh , could you skip your test if openai is not installed? Like it was done in other tests.
@sonichi , I intended to run tests in all the contrib folder, wanted to have a better coverage report. But it looks like no need to do so, right? Do you want me to raise another PR to modify it or maybe @joshkyh can modify it in this PR?
@kevin666aa @rickyloynd-microsoft the coverage section of compressible test and teachable test are missing in the contrib-test.yml @IANTHEREAL @gagb same for GPTAssistant test.
Got it, will add the test. The converge was not there because the whole contrib test is tested in line 56, as pointed out.
Thanks @kevin666a & sonichi, I just realized sonichi debugged why that 1 test has failed in https://github.com/microsoft/autogen/actions/runs/7082505113/job/19273363788?pr=857. Its because the entire contrib folder is tested in line 56.
Hi @joshkyh , could you skip your test if openai is not installed? Like it was done in other tests.
@sonichi , I intended to run tests in all the contrib folder, wanted to have a better coverage report. But it looks like no need to do so, right? Do you want me to raise another PR to modify it or maybe @joshkyh can modify it in this PR?
Each test may require different dependencies, so they need to be run separately. And each job needs to upload the coverage report in the same way.
Each test may require different dependencies, so they need to be run separately. And each job needs to upload the coverage report in the same way.
I'm not very familiar with this background, but is the task similar to this test https://github.com/microsoft/autogen/blob/main/.github/workflows/contrib-tests.yml#L164, where the coverage is reported separately? @sonichi
@thinkall I have made the edit from the whole folder to test/agentchat/contrib/test_retrievechat.py please let me know if you'd like to test more .py files in that folder. Thanks!
@thinkall I have made the edit from the whole folder to test/agentchat/contrib/test_retrievechat.py please let me know if you'd like to test more .py files in that folder. Thanks!
We'll also need to add tests for qdrant.
Thanks @thinkall, would one of the maintainers be able to approve the "Waiting" OpenAI unit tests to run please? It runs well on my local setup, but I want to check that it is also passing with the remote OpenAI credentials.
Thanks @qingyun-wu 🙏. Working on the failed unit tests...
Ahhh... I realized approval is required each time I iterate... Any chance we can set OpenAI4ContribTests/GraphGroupChat to auto-approve for this PR? I understand this is a risk that we might not want to take.
Or maybe somehow simulating .github workflows locally https://github.com/nektos/act But its getting hard running Docker-in-Docker.
Any recommendations from the maintainers...?
In the notebook: Can user just use pip install pyautogen[graphs]
In the notebook: Can user just use
pip install autogen[graphs]
I think you mean pip install pyautogen[graphs]
Thanks everyone for their reviews, it's great to see interest in GraphGroupChat.
@kevin666aa, @rickyloynd-microsoft the notebook has been updated with your suggestions on pip install.
I had a conversation with @sonichi about this PR today, and only now realized that the validation specifically excludes self loops. In many scenarios (e.g., with the WebSurferAgent) self-loops are really important. For example, if I ask "How far is the moon from earth", the WebSurferAgent will likely perform the following self-loop steps:
- Search the web the Wikipedia page on the Moon
- Visit the wikipedia page linked in the results
- Perform a find-in-page operation to scroll to the section that contains the answer.
Each of those is performed by the same agent, back-to-back, and is an example where self loops are important.
There are related scenarios with ReAct etc. TL;DR; I think self-loops should be allowed.
I had a conversation with @sonichi about this PR today, and only now realized that the validation specifically excludes self loops. In many scenarios (e.g., with the WebSurferAgent) self-loops are really important. For example, if I ask "How far is the moon from earth", the WebSurferAgent will likely perform the following self-loop steps:
- Search the web the Wikipedia page on the Moon
- Visit the wikipedia page linked in the results
- Perform a find-in-page operation to scroll to the section that contains the answer.
Each of those is performed by the same agent, back-to-back, and is an example where self loops are important.
There are related scenarios with ReAct etc. TL;DR; I think self-loops should be allowed.
Hey @afourney, I think there is a misunderstanding. This PR supports self-loops. I think you are referring to https://github.com/microsoft/autogen/blob/f813a34fbcde668fe92e4bad813d8bf74f5a8c17/autogen/graph_utils.py#L56
This line allows self loops IF allow_repeat_speaker is set to True or a list. True is the default in GroupChat in https://github.com/microsoft/autogen/blob/f813a34fbcde668fe92e4bad813d8bf74f5a8c17/autogen/agentchat/groupchat.py#L55
GroupChat's allow_repeat_speaker gets passed into the validity check in
https://github.com/microsoft/autogen/blob/f813a34fbcde668fe92e4bad813d8bf74f5a8c17/autogen/agentchat/groupchat.py#L93
I think we are on the same page. Please feel free to point out specific code segments that show that self-loops are not allowed which I might have missed though.
Could we get an update on this PR's status after the refactor? Is it still in discussion or moving forward? Also, I saw a mention of using FSM by @freedeaths . Could this be a good fit for this PR? And with the graph introduction, are we considering changing how we broadcast messages? Thanks
@IANTHEREAL conditional transition were not considered in this PR. It is a great idea but I don't have time to manage yet another expansion of scope.
It has taken about 5 iterations and some face to face discussion to reach this point. I aim to weave in sonichi previous review over the next two weeks.
@freedeaths are you keen to move this forward by working on the GraphGroupChat branch on the previous review and it can be upgraded to include conditional transitions through finite state machines? Alternatively, fsm can move forward and I close this PR without merge.
@joshkyh there are some questions that I'd like to get an answer from you. If you could answer them in the conversation, then I can possibly make the changes needed or get others to help.
@joshkyh there are some questions that I'd like to get an answer from you. If you could answer them in the conversation, then I can possibly make the changes needed or get others to help.
Hey @sonichi, thanks for the offer to help. I have replied them in conversation. Please let me know if you have further questions.
@IANTHEREAL conditional transition were not considered in this PR. It is a great idea but I don't have time to manage yet another expansion of scope.
It has taken about 5 iterations and some face to face discussion to reach this point. I aim to weave in sonichi previous review over the next two weeks.
@freedeaths are you keen to move this forward by working on the GraphGroupChat branch on the previous review and it can be upgraded to include conditional transitions through finite state machines? Alternatively, fsm can move forward and I close this PR without merge.
Thank you for your replies. At the implementation level, I think there is not much overlap between Issue #1400 and this PR. However, in terms of functionality, I agree with @afourney's point about the overlap.
I will take a closer look at the implementation of this PR firstly to determine which option is better. Or which way do you prefer? @joshkyh
@IANTHEREAL conditional transition were not considered in this PR. It is a great idea but I don't have time to manage yet another expansion of scope. It has taken about 5 iterations and some face to face discussion to reach this point. I aim to weave in sonichi previous review over the next two weeks. @freedeaths are you keen to move this forward by working on the GraphGroupChat branch on the previous review and it can be upgraded to include conditional transitions through finite state machines? Alternatively, fsm can move forward and I close this PR without merge.
Thank you for your replies. At the implementation level, I think there is not much overlap between Issue #1400 and this PR. However, in terms of functionality, I agree with @afourney's point about the overlap.
I will take a closer look at the implementation of this PR firstly to determine which option is better. Or which way do you prefer? @joshkyh
@freedeaths I like your proposal in #1400 . How about we combine the idea in #1400 and some ideas in this PR? Specifically: Use the fsm spec in #1400 to specify the speaker order constraint. Instead of making it a speaker selection method, make it an optional spec independent of the speaker selection method. Default is a fully connected transition matrix.
cc @qingyun-wu @ekzhu
@IANTHEREAL conditional transition were not considered in this PR. It is a great idea but I don't have time to manage yet another expansion of scope. It has taken about 5 iterations and some face to face discussion to reach this point. I aim to weave in sonichi previous review over the next two weeks. @freedeaths are you keen to move this forward by working on the GraphGroupChat branch on the previous review and it can be upgraded to include conditional transitions through finite state machines? Alternatively, fsm can move forward and I close this PR without merge.
Thank you for your replies. At the implementation level, I think there is not much overlap between Issue #1400 and this PR. However, in terms of functionality, I agree with @afourney's point about the overlap. I will take a closer look at the implementation of this PR firstly to determine which option is better. Or which way do you prefer? @joshkyh
@freedeaths I like your proposal in #1400 . How about we combine the idea in #1400 and some ideas in this PR? Specifically: Use the fsm spec in #1400 to specify the speaker order constraint. Instead of making it a speaker selection method, make it an optional spec independent of the speaker selection method. Default is a fully connected transition matrix.
cc @qingyun-wu @ekzhu
Thanks. I agree with you. I am reading the changes of this PR so that I can better understand it and make a combination.
@IANTHEREAL conditional transition were not considered in this PR. It is a great idea but I don't have time to manage yet another expansion of scope. It has taken about 5 iterations and some face to face discussion to reach this point. I aim to weave in sonichi previous review over the next two weeks. @freedeaths are you keen to move this forward by working on the GraphGroupChat branch on the previous review and it can be upgraded to include conditional transitions through finite state machines? Alternatively, fsm can move forward and I close this PR without merge.
Thank you for your replies. At the implementation level, I think there is not much overlap between Issue #1400 and this PR. However, in terms of functionality, I agree with @afourney's point about the overlap. I will take a closer look at the implementation of this PR firstly to determine which option is better. Or which way do you prefer? @joshkyh
@freedeaths I like your proposal in #1400 . How about we combine the idea in #1400 and some ideas in this PR? Specifically: Use the fsm spec in #1400 to specify the speaker order constraint. Instead of making it a speaker selection method, make it an optional spec independent of the speaker selection method. Default is a fully connected transition matrix. cc @qingyun-wu @ekzhu
Thanks. I agree with you. I am reading the changes of this PR so that I can better understand it and make a combination.
Let me continue the thread.
First of all, let me update the experiment results. I made a mistake in #1400. The code status for the experiment I added using GraphGroupChat is not the latest state of this branch. In the latest state, there were no errors in the manager's selection, and the accuracy was 5/5 = 100%. I apologize if this has caused any confusion. @afourney @victordibia
Secondly, I still have two suggestions in the implementation:
- In this PR, the else part: If the length of
graph_eligible_agentsis 1, setselected_agentdirectly asgraph_eligible_agents[0]. In this case, it doesn't make sense for LLM to make a decision. - The select_speaker_messages part: This part affects the quality of the output especially when the chat history is too long. My idea is that if
allowed_or_disallowed_speaker_orderis not empty, it implies that the user explicitly provided an FSM (Finite State Machine). In this case, we can setselect_speaker_messages = select_speaker_messages[-1:]before returning. This is because in an FSM, the next state is determined only by the previous state and the transition condition and is independent of earlier states.
These 2 suggestions will both be helpful in saving costs and improving accuracy if the FSM is explicitly provided.
@sonichi @joshkyh
Thanks for looking into this PR closely. I'm glad the newest commit resolves the accuracy issue you were seeing earlier, without explicitly listing the on condition. Regarding the collaboration, I won't be working on this PR soon, I'll leave it as-is for the next couple of weeks if anyone is able to work on it.
Agree on [1].
Truncating history in [2] to the last message might be too aggressive, even if that's the definition of fsm. Based on human interactions, I believe we can agree that having a feeling of who should speak next could depend on a wider context. I understand the fullest context can be expensive, but a narrow context of only last message might be too narrow. Number of past talk turns could be a hyperparameter to tune for if this is presented as a classification problem.
Truncating history in [2] to the last message might be too aggressive, even if that's the definition of fsm. Based on human interactions, I believe we can agree that having a feeling of who should speak next could depend on a wider context. I understand the fullest context can be expensive, but a narrow context of only last message might be too narrow. Number of past talk turns could be a hyperparameter to tune for if this is presented as a classification problem.
If we define the transition conditions in description parameter of the Agent, which is similar to the on condition as I mentioned before, the last message is sufficient, I think. Furthermore, the whole chat history might have side effects if it is too long. Maybe I will add an experiment soon, in which there may be about 100k tokens. If there are any results go against intuition, I will share them to you.
@freedeaths @qingyun-wu given that @joshkyh won't work on this PR for a while, could you figure out a plan to move forward? I'm OK with either building on this PR or creating a new PR.