support ask_review by llm
add ask_review by llm, and adjust the generated code to fit the jupyter notebook.
Features
- add 3 review_type : ["human", "llm", "confirm_all"];
Result
If the generated code is executed successfully, but the code does not match the task requirements, it can be detected through llm's ask review, and it will make modification suggestions. The example is as follows,
@pytest.mark.asyncio
async def test_ask_review_llm():
context = [
Message("Train a model to predict wine class using the training set."),
Message(
"""
from sklearn.datasets import load_wine
wine_data = load_wine()
plt.hist(wine_data.target, bins=len(wine_data.target_names))
plt.xlabel('Class')
plt.ylabel('Number of Samples')
plt.title('Distribution of Wine Classes')
plt.xticks(range(len(wine_data.target_names)), wine_data.target_names)
plt.show()
"""
),
]
rsp, confirmed = await AskReview().run(context, review_type="llm")
assert rsp.startswith(("redo", "change")) # -> True
assert not confirmed # -> True
pirnt(rsp)
# ```
# redo the task, the provided code only includes data loading and visualization, but does not include any steps related
# to training a model.
# ```
Codecov Report
Attention: Patch coverage is 92.30769% with 4 lines in your changes are missing coverage. Please review.
Project coverage is 82.73%. Comparing base (
0271cd7) to head (5ff3cd1). Report is 17 commits behind head on code_interpreter.
| Files | Patch % | Lines |
|---|---|---|
| metagpt/actions/di/ask_review.py | 91.30% | 2 Missing :warning: |
| metagpt/roles/di/data_interpreter.py | 75.00% | 1 Missing :warning: |
| metagpt/strategy/planner.py | 93.33% | 1 Missing :warning: |
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@ Coverage Diff @@
## code_interpreter #959 +/- ##
====================================================
+ Coverage 82.70% 82.73% +0.03%
====================================================
Files 223 223
Lines 13129 13164 +35
====================================================
+ Hits 10858 10891 +33
- Misses 2271 2273 +2
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
add ask_review by llm, and adjust the generated code to fit the jupyter notebook.
Features
- add 3 review_type : ["human", "llm", "confirm_all"];
- update DEFAULT_SYSTEM_MSG in BaseWriteAnalysisCode, Make it clear that the executor is jupyter notebook.
Result
If the generated code is executed successfully, but the code does not match the task requirements, it can be detected through
llm's ask review, and it will make modification suggestions. The example is as follows,@pytest.mark.asyncio async def test_ask_review_llm(): context = [ Message("Train a model to predict wine class using the training set."), Message( """ from sklearn.datasets import load_wine wine_data = load_wine() plt.hist(wine_data.target, bins=len(wine_data.target_names)) plt.xlabel('Class') plt.ylabel('Number of Samples') plt.title('Distribution of Wine Classes') plt.xticks(range(len(wine_data.target_names)), wine_data.target_names) plt.show() """ ), ] rsp, confirmed = await AskReview().run(context, review_type="llm") assert rsp.startswith(("redo", "change")) # -> True assert not confirmed # -> True pirnt(rsp) # ``` # change task current task, split the task into two separate tasks: # 1. Train a model to predict wine class using the training set. # 2. Visualize the distribution of wine classes with a histogram. # ```
Prefer a more practical example. Review is useful when errors occur. Perhaps give an example showing how it handles errors, such as suggesting "redo" with feedback, or "change" with updated current task instruction?
