MetaGPT support ask_review by llm

add ask_review by llm, and adjust the generated code to fit the jupyter notebook.

Features

add 3 review_type : ["human", "llm", "confirm_all"];

Result

If the generated code is executed successfully, but the code does not match the task requirements, it can be detected through llm's ask review, and it will make modification suggestions. The example is as follows,

@pytest.mark.asyncio
async def test_ask_review_llm():
    context = [
        Message("Train a model to predict wine class using the training set."),
        Message(
               """
               from sklearn.datasets import load_wine
               wine_data = load_wine()
               plt.hist(wine_data.target, bins=len(wine_data.target_names))
               plt.xlabel('Class')
               plt.ylabel('Number of Samples')
               plt.title('Distribution of Wine Classes')
               plt.xticks(range(len(wine_data.target_names)), wine_data.target_names)
               plt.show()
               """
        ),
    ]
    rsp, confirmed = await AskReview().run(context, review_type="llm")
    assert rsp.startswith(("redo", "change"))   # -> True
    assert not confirmed                        # -> True
    pirnt(rsp)
    # ```
    # redo the task, the provided code only includes data loading and visualization, but does not include any steps related 
    # to training a model.
    # ```

Mar 05 '24 01:03 orange-crow

Codecov Report

Attention: Patch coverage is 92.30769% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 82.73%. Comparing base (0271cd7) to head (5ff3cd1). Report is 17 commits behind head on code_interpreter.

Files	Patch %	Lines
metagpt/actions/di/ask_review.py	91.30%	2 Missing :warning:
metagpt/roles/di/data_interpreter.py	75.00%	1 Missing :warning:
metagpt/strategy/planner.py	93.33%	1 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@                 Coverage Diff                  @@
##           code_interpreter     #959      +/-   ##
====================================================
+ Coverage             82.70%   82.73%   +0.03%     
====================================================
  Files                   223      223              
  Lines                 13129    13164      +35     
====================================================
+ Hits                  10858    10891      +33     
- Misses                 2271     2273       +2

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Mar 05 '24 02:03 codecov-commenter

add ask_review by llm, and adjust the generated code to fit the jupyter notebook.

Features

add 3 review_type : ["human", "llm", "confirm_all"];

update DEFAULT_SYSTEM_MSG in BaseWriteAnalysisCode, Make it clear that the executor is jupyter notebook.

Result

If the generated code is executed successfully, but the code does not match the task requirements, it can be detected through llm's ask review, and it will make modification suggestions. The example is as follows,
@pytest.mark.asyncio
async def test_ask_review_llm():
    context = [
        Message("Train a model to predict wine class using the training set."),
        Message(
               """
               from sklearn.datasets import load_wine
               wine_data = load_wine()
               plt.hist(wine_data.target, bins=len(wine_data.target_names))
               plt.xlabel('Class')
               plt.ylabel('Number of Samples')
               plt.title('Distribution of Wine Classes')
               plt.xticks(range(len(wine_data.target_names)), wine_data.target_names)
               plt.show()
               """
        ),
    ]
    rsp, confirmed = await AskReview().run(context, review_type="llm")
    assert rsp.startswith(("redo", "change"))   # -> True
    assert not confirmed                        # -> True
    pirnt(rsp)
    # ```
    # change task current task, split the task into two separate tasks: 
    # 1. Train a model to predict wine class using the training set.
    # 2. Visualize the distribution of wine classes with a histogram.
    # ```

Prefer a more practical example. Review is useful when errors occur. Perhaps give an example showing how it handles errors, such as suggesting "redo" with feedback, or "change" with updated current task instruction?

Mar 05 '24 02:03 garylin2099