autogen Graph group chat

@sonichi, please note that this is the PR continued from #753 as discussed.

@afourney, @thinkall, @JieyuZ2, @LinxinS97, @chnldw, @ragyabraham, @rytsang were the previously invited reviewers in #753

Why are these changes needed?

Following our video-conference discussion, we have decided to extend GroupChat directly to accept two parameters:

allowed_or_disallowed_speaker_order: A dictionary depicting from/to.
speaker_order_type: allowed/disallowed transitions

Related issue number

Closes #743 Ideas for #736 #818 #584

Checks

[X] I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
[X] I've added tests (if relevant) corresponding to the changes introduced in this PR.
[X] I've made sure all auto checks have passed.

Dec 04 '23 04:12 joshkyh

Codecov Report

Attention: 30 lines in your changes are missing coverage. Please review.

Comparison is base (feed806) 34.26% compared to head (1e49883) 54.34%.

Files	Patch %	Lines
autogen/agentchat/groupchat.py	76.19%	8 Missing and 7 partials :warning:
autogen/graph_utils.py	72.72%	14 Missing and 1 partial :warning:

Additional details and impacted files

@@             Coverage Diff             @@
##             main     #857       +/-   ##
===========================================
+ Coverage   34.26%   54.34%   +20.08%     
===========================================
  Files          42       43        +1     
  Lines        5099     5207      +108     
  Branches     1165     1280      +115     
===========================================
+ Hits         1747     2830     +1083     
+ Misses       3209     2131     -1078     
- Partials      143      246      +103

Flag	Coverage Δ
unittests	`54.31% <74.57%> (+20.04%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Dec 04 '23 04:12 codecov-commenter

Please add --ignore=test/agentchat/contrib in the line pytest test of build.yml.

Dec 04 '23 04:12 sonichi

@thinkall line 56 of the contrib test doesn't make sense.

Dec 04 '23 04:12 sonichi

@kevin666aa @rickyloynd-microsoft the coverage section of compressible test and teachable test are missing in the contrib-test.yml @IANTHEREAL @gagb same for GPTAssistant test.

Dec 04 '23 04:12 sonichi

@kevin666aa @rickyloynd-microsoft the coverage section of compressible test and teachable test are missing in the contrib-test.yml @IANTHEREAL @gagb same for GPTAssistant test.

Got it, will add the test. The converge was not there because the whole contrib test is tested in line 56, as pointed out.

Dec 04 '23 05:12 yiranwu0

@kevin666aa @rickyloynd-microsoft the coverage section of compressible test and teachable test are missing in the contrib-test.yml @IANTHEREAL @gagb same for GPTAssistant test.

Got it, will add the test. The converge was not there because the whole contrib test is tested in line 56, as pointed out.

Thanks @kevin666a & sonichi, I just realized sonichi debugged why that 1 test has failed in https://github.com/microsoft/autogen/actions/runs/7082505113/job/19273363788?pr=857. Its because the entire contrib folder is tested in line 56.

Dec 04 '23 05:12 joshkyh

@kevin666aa @rickyloynd-microsoft the coverage section of compressible test and teachable test are missing in the contrib-test.yml @IANTHEREAL @gagb same for GPTAssistant test.

Got it, will add the test. The converge was not there because the whole contrib test is tested in line 56, as pointed out.

Thanks @kevin666a & sonichi, I just realized sonichi debugged why that 1 test has failed in https://github.com/microsoft/autogen/actions/runs/7082505113/job/19273363788?pr=857. Its because the entire contrib folder is tested in line 56.

Hi @joshkyh , could you skip your test if openai is not installed? Like it was done in other tests.

@sonichi , I intended to run tests in all the contrib folder, wanted to have a better coverage report. But it looks like no need to do so, right? Do you want me to raise another PR to modify it or maybe @joshkyh can modify it in this PR?

Dec 04 '23 09:12 thinkall

@kevin666aa @rickyloynd-microsoft the coverage section of compressible test and teachable test are missing in the contrib-test.yml @IANTHEREAL @gagb same for GPTAssistant test.

Got it, will add the test. The converge was not there because the whole contrib test is tested in line 56, as pointed out.

Thanks @kevin666a & sonichi, I just realized sonichi debugged why that 1 test has failed in https://github.com/microsoft/autogen/actions/runs/7082505113/job/19273363788?pr=857. Its because the entire contrib folder is tested in line 56.

Hi @joshkyh , could you skip your test if openai is not installed? Like it was done in other tests.

@sonichi , I intended to run tests in all the contrib folder, wanted to have a better coverage report. But it looks like no need to do so, right? Do you want me to raise another PR to modify it or maybe @joshkyh can modify it in this PR?

Each test may require different dependencies, so they need to be run separately. And each job needs to upload the coverage report in the same way.

Dec 04 '23 14:12 sonichi

Each test may require different dependencies, so they need to be run separately. And each job needs to upload the coverage report in the same way.

I'm not very familiar with this background, but is the task similar to this test https://github.com/microsoft/autogen/blob/main/.github/workflows/contrib-tests.yml#L164, where the coverage is reported separately? @sonichi

Dec 05 '23 01:12 IANTHEREAL

@thinkall I have made the edit from the whole folder to test/agentchat/contrib/test_retrievechat.py please let me know if you'd like to test more .py files in that folder. Thanks!

Dec 05 '23 03:12 joshkyh

@thinkall I have made the edit from the whole folder to test/agentchat/contrib/test_retrievechat.py please let me know if you'd like to test more .py files in that folder. Thanks!

We'll also need to add tests for qdrant.

Dec 07 '23 13:12 thinkall

Thanks @thinkall, would one of the maintainers be able to approve the "Waiting" OpenAI unit tests to run please? It runs well on my local setup, but I want to check that it is also passing with the remote OpenAI credentials.

Dec 07 '23 22:12 joshkyh

Thanks @qingyun-wu 🙏. Working on the failed unit tests...

Dec 07 '23 22:12 joshkyh

Ahhh... I realized approval is required each time I iterate... Any chance we can set OpenAI4ContribTests/GraphGroupChat to auto-approve for this PR? I understand this is a risk that we might not want to take.

Or maybe somehow simulating .github workflows locally https://github.com/nektos/act But its getting hard running Docker-in-Docker.

Any recommendations from the maintainers...?

Dec 07 '23 23:12 joshkyh

In the notebook: Can user just use pip install pyautogen[graphs]

Dec 12 '23 16:12 yiranwu0

In the notebook: Can user just use pip install autogen[graphs]

I think you mean pip install pyautogen[graphs]

Dec 12 '23 16:12 rickyloynd-microsoft

Thanks everyone for their reviews, it's great to see interest in GraphGroupChat.

@kevin666aa, @rickyloynd-microsoft the notebook has been updated with your suggestions on pip install.

Dec 13 '23 10:12 joshkyh

I had a conversation with @sonichi about this PR today, and only now realized that the validation specifically excludes self loops. In many scenarios (e.g., with the WebSurferAgent) self-loops are really important. For example, if I ask "How far is the moon from earth", the WebSurferAgent will likely perform the following self-loop steps:

Search the web the Wikipedia page on the Moon
Visit the wikipedia page linked in the results
Perform a find-in-page operation to scroll to the section that contains the answer.

Each of those is performed by the same agent, back-to-back, and is an example where self loops are important.

There are related scenarios with ReAct etc. TL;DR; I think self-loops should be allowed.

Jan 10 '24 17:01 afourney

I had a conversation with @sonichi about this PR today, and only now realized that the validation specifically excludes self loops. In many scenarios (e.g., with the WebSurferAgent) self-loops are really important. For example, if I ask "How far is the moon from earth", the WebSurferAgent will likely perform the following self-loop steps:

Search the web the Wikipedia page on the Moon

Visit the wikipedia page linked in the results

Perform a find-in-page operation to scroll to the section that contains the answer.

Each of those is performed by the same agent, back-to-back, and is an example where self loops are important.

There are related scenarios with ReAct etc. TL;DR; I think self-loops should be allowed.

Hey @afourney, I think there is a misunderstanding. This PR supports self-loops. I think you are referring to https://github.com/microsoft/autogen/blob/f813a34fbcde668fe92e4bad813d8bf74f5a8c17/autogen/graph_utils.py#L56

This line allows self loops IF allow_repeat_speaker is set to True or a list. True is the default in GroupChat in https://github.com/microsoft/autogen/blob/f813a34fbcde668fe92e4bad813d8bf74f5a8c17/autogen/agentchat/groupchat.py#L55

GroupChat's allow_repeat_speaker gets passed into the validity check in https://github.com/microsoft/autogen/blob/f813a34fbcde668fe92e4bad813d8bf74f5a8c17/autogen/agentchat/groupchat.py#L93

I think we are on the same page. Please feel free to point out specific code segments that show that self-loops are not allowed which I might have missed though.

Jan 11 '24 00:01 joshkyh

Could we get an update on this PR's status after the refactor? Is it still in discussion or moving forward? Also, I saw a mention of using FSM by @freedeaths . Could this be a good fit for this PR? And with the graph introduction, are we considering changing how we broadcast messages? Thanks

Jan 26 '24 05:01 IANTHEREAL

@IANTHEREAL conditional transition were not considered in this PR. It is a great idea but I don't have time to manage yet another expansion of scope.

It has taken about 5 iterations and some face to face discussion to reach this point. I aim to weave in sonichi previous review over the next two weeks.

@freedeaths are you keen to move this forward by working on the GraphGroupChat branch on the previous review and it can be upgraded to include conditional transitions through finite state machines? Alternatively, fsm can move forward and I close this PR without merge.

Jan 26 '24 08:01 joshkyh

@joshkyh there are some questions that I'd like to get an answer from you. If you could answer them in the conversation, then I can possibly make the changes needed or get others to help.

Jan 27 '24 19:01 sonichi

@joshkyh there are some questions that I'd like to get an answer from you. If you could answer them in the conversation, then I can possibly make the changes needed or get others to help.

Hey @sonichi, thanks for the offer to help. I have replied them in conversation. Please let me know if you have further questions.

Jan 28 '24 07:01 joshkyh

@IANTHEREAL conditional transition were not considered in this PR. It is a great idea but I don't have time to manage yet another expansion of scope.

It has taken about 5 iterations and some face to face discussion to reach this point. I aim to weave in sonichi previous review over the next two weeks.

@freedeaths are you keen to move this forward by working on the GraphGroupChat branch on the previous review and it can be upgraded to include conditional transitions through finite state machines? Alternatively, fsm can move forward and I close this PR without merge.

Thank you for your replies. At the implementation level, I think there is not much overlap between Issue #1400 and this PR. However, in terms of functionality, I agree with @afourney's point about the overlap.

I will take a closer look at the implementation of this PR firstly to determine which option is better. Or which way do you prefer? @joshkyh

Jan 29 '24 03:01 freedeaths

@IANTHEREAL conditional transition were not considered in this PR. It is a great idea but I don't have time to manage yet another expansion of scope. It has taken about 5 iterations and some face to face discussion to reach this point. I aim to weave in sonichi previous review over the next two weeks. @freedeaths are you keen to move this forward by working on the GraphGroupChat branch on the previous review and it can be upgraded to include conditional transitions through finite state machines? Alternatively, fsm can move forward and I close this PR without merge.

Thank you for your replies. At the implementation level, I think there is not much overlap between Issue #1400 and this PR. However, in terms of functionality, I agree with @afourney's point about the overlap.

I will take a closer look at the implementation of this PR firstly to determine which option is better. Or which way do you prefer? @joshkyh

@freedeaths I like your proposal in #1400 . How about we combine the idea in #1400 and some ideas in this PR? Specifically: Use the fsm spec in #1400 to specify the speaker order constraint. Instead of making it a speaker selection method, make it an optional spec independent of the speaker selection method. Default is a fully connected transition matrix.

cc @qingyun-wu @ekzhu

Jan 29 '24 03:01 sonichi

@IANTHEREAL conditional transition were not considered in this PR. It is a great idea but I don't have time to manage yet another expansion of scope. It has taken about 5 iterations and some face to face discussion to reach this point. I aim to weave in sonichi previous review over the next two weeks. @freedeaths are you keen to move this forward by working on the GraphGroupChat branch on the previous review and it can be upgraded to include conditional transitions through finite state machines? Alternatively, fsm can move forward and I close this PR without merge.

Thank you for your replies. At the implementation level, I think there is not much overlap between Issue #1400 and this PR. However, in terms of functionality, I agree with @afourney's point about the overlap. I will take a closer look at the implementation of this PR firstly to determine which option is better. Or which way do you prefer? @joshkyh

@freedeaths I like your proposal in #1400 . How about we combine the idea in #1400 and some ideas in this PR? Specifically: Use the fsm spec in #1400 to specify the speaker order constraint. Instead of making it a speaker selection method, make it an optional spec independent of the speaker selection method. Default is a fully connected transition matrix.

cc @qingyun-wu @ekzhu

Thanks. I agree with you. I am reading the changes of this PR so that I can better understand it and make a combination.

Jan 29 '24 04:01 freedeaths

@IANTHEREAL conditional transition were not considered in this PR. It is a great idea but I don't have time to manage yet another expansion of scope. It has taken about 5 iterations and some face to face discussion to reach this point. I aim to weave in sonichi previous review over the next two weeks. @freedeaths are you keen to move this forward by working on the GraphGroupChat branch on the previous review and it can be upgraded to include conditional transitions through finite state machines? Alternatively, fsm can move forward and I close this PR without merge.

Thank you for your replies. At the implementation level, I think there is not much overlap between Issue #1400 and this PR. However, in terms of functionality, I agree with @afourney's point about the overlap. I will take a closer look at the implementation of this PR firstly to determine which option is better. Or which way do you prefer? @joshkyh

@freedeaths I like your proposal in #1400 . How about we combine the idea in #1400 and some ideas in this PR? Specifically: Use the fsm spec in #1400 to specify the speaker order constraint. Instead of making it a speaker selection method, make it an optional spec independent of the speaker selection method. Default is a fully connected transition matrix. cc @qingyun-wu @ekzhu

Thanks. I agree with you. I am reading the changes of this PR so that I can better understand it and make a combination.

Let me continue the thread.

First of all, let me update the experiment results. I made a mistake in #1400. The code status for the experiment I added using GraphGroupChat is not the latest state of this branch. In the latest state, there were no errors in the manager's selection, and the accuracy was 5/5 = 100%. I apologize if this has caused any confusion. @afourney @victordibia

Secondly, I still have two suggestions in the implementation:

In this PR, the else part: If the length of graph_eligible_agents is 1, set selected_agent directly as graph_eligible_agents[0]. In this case, it doesn't make sense for LLM to make a decision.
The select_speaker_messages part: This part affects the quality of the output especially when the chat history is too long. My idea is that if allowed_or_disallowed_speaker_order is not empty, it implies that the user explicitly provided an FSM (Finite State Machine). In this case, we can set select_speaker_messages = select_speaker_messages[-1:] before returning. This is because in an FSM, the next state is determined only by the previous state and the transition condition and is independent of earlier states.

These 2 suggestions will both be helpful in saving costs and improving accuracy if the FSM is explicitly provided.

@sonichi @joshkyh

Jan 29 '24 09:01 freedeaths

Thanks for looking into this PR closely. I'm glad the newest commit resolves the accuracy issue you were seeing earlier, without explicitly listing the on condition. Regarding the collaboration, I won't be working on this PR soon, I'll leave it as-is for the next couple of weeks if anyone is able to work on it.

Agree on [1].

Truncating history in [2] to the last message might be too aggressive, even if that's the definition of fsm. Based on human interactions, I believe we can agree that having a feeling of who should speak next could depend on a wider context. I understand the fullest context can be expensive, but a narrow context of only last message might be too narrow. Number of past talk turns could be a hyperparameter to tune for if this is presented as a classification problem.

Jan 29 '24 10:01 joshkyh

Truncating history in [2] to the last message might be too aggressive, even if that's the definition of fsm. Based on human interactions, I believe we can agree that having a feeling of who should speak next could depend on a wider context. I understand the fullest context can be expensive, but a narrow context of only last message might be too narrow. Number of past talk turns could be a hyperparameter to tune for if this is presented as a classification problem.

If we define the transition conditions in description parameter of the Agent, which is similar to the on condition as I mentioned before, the last message is sufficient, I think. Furthermore, the whole chat history might have side effects if it is too long. Maybe I will add an experiment soon, in which there may be about 100k tokens. If there are any results go against intuition, I will share them to you.

Jan 29 '24 11:01 freedeaths

@freedeaths @qingyun-wu given that @joshkyh won't work on this PR for a while, could you figure out a plan to move forward? I'm OK with either building on this PR or creating a new PR.

Jan 29 '24 21:01 sonichi

autogen autogen copied to clipboard

Graph group chat

Why are these changes needed?

Related issue number

Checks

Codecov Report

autogen
autogen copied to clipboard