feat:[response API] GPT 5.1 apply_patch tool call support
Why are these changes needed?
This pull request is a short PR adding apply patch tool call support for gpt-5.1 and codex models
- added type safety for built in tools params
- updated client and message retrieval for apply_tool support
Related issue number
#2211
Checks
- [x] I've included any doc changes needed for https://docs.ag2.ai/. See https://docs.ag2.ai/latest/docs/contributor-guide/documentation/ to build and test documentation locally.
- [x] I've added tests (if relevant) corresponding to the changes introduced in this PR.
- [x] I've made sure all auto checks have passed.
π Documentation Analysis
All docs are up to date! π
β Latest commit analyzed: 3463db8caa0763524da85dbc9665abb7ceea02c7 | Powered by Joggr
I am really looking forward to this integration.
@claude How apply patch is supported in AG2 here? Reference https://platform.openai.com/docs/guides/tools-apply-patch
Claude finished @randombet's task ββ View job
How apply_patch is supported in AG2
I've analyzed the PR implementation and can explain how apply_patch support works in AG2, following OpenAI's Responses API design pattern.
Architecture Overview
The apply_patch tool in AG2 follows OpenAI's Responses API pattern with three main components:
1. Core Tool Implementation (autogen/tools/apply_patch_tool.py)
This provides the file operations engine with two main classes:
WorkspaceEditor: Filesystem abstraction that:
- Manages file operations within a workspace directory
- Enforces security through
allowed_pathspatterns (supports glob patterns like**,src/**) - Applies unified diffs using the custom
_V4ADiffApplier - Supports three operations:
create_file,update_file,delete_file - Works with both sync and async operations
ApplyPatchTool: AG2 Tool wrapper that:
- Integrates
WorkspaceEditorwith the AG2 agent framework - Handles approval workflows via
needs_approvalandon_approvalcallbacks - Provides both sync and async patch application
2. Responses API Client Integration (autogen/oai/openai_responses.py)
The OpenAIResponsesClient handles the conversation flow:
Key Method: _apply_patch_operation() (lines 229-330)
- Creates a
WorkspaceEditorinstance with workspace_dir and allowed_paths - Routes operations to appropriate handlers (create/update/delete)
- Handles async execution (with event loop detection for Jupyter compatibility)
- Returns
ApplyPatchCallOutputdataclass with status and output
Message Processing: _normalize_messages_for_responses_api() (lines 332-490)
- Extracts
apply_patch_callitems from messages - Executes patch operations and converts them to
apply_patch_call_output - Handles both current message patches and continuation from previous responses
- Filters out already-processed patch calls to avoid duplication
Request Building (lines 495-550):
- Adds
{"type": "apply_patch"}to the tools list whenapply_patchorapply_patch_asyncis inbuilt_in_tools - Retrieves previous response to check for pending apply_patch_calls
3. Configuration Integration (autogen/oai/client.py)
Extends OpenAILLMConfigEntry with new fields (lines 259-260, 282-283):
workspace_dir: str | None
allowed_paths: list[str] | None
And adds to create() method signature (lines 1625-1629):
built_in_tools: list[Literal["web_search", "image_generation", "apply_patch", "apply_patch_async"]]
workspace_dir: str | None = None
allowed_paths: list[str] | None = None
Usage Pattern
Based on the example notebook (notebook/agentchat_gpt5.1_apply_patch_example.ipynb), here's how to use it:
Simple Usage (Automatic Setup)
llm_config = LLMConfig(
config_list={
"api_type": "responses",
"model": "gpt-5.1",
"api_key": os.getenv("OPENAI_API_KEY"),
"built_in_tools": ["apply_patch"], # or "apply_patch_async"
"workspace_dir": "./my_project_folder", # Optional, defaults to cwd
"allowed_paths": ["src/**", "tests/**"], # Optional, defaults to ["**"]
},
)
coding_agent = ConversableAgent(
name="coding_assistant",
llm_config=llm_config,
system_message="You are a helpful coding assistant...",
)
# Agent automatically has access to apply_patch tool
result = coding_agent.initiate_chat(
recipient=coding_agent,
message="Create src/main.py with a hello world function",
max_turns=2,
)
Advanced Usage (Manual Tool Registration)
from autogen.tools import ApplyPatchTool, WorkspaceEditor
editor = WorkspaceEditor(
workspace_dir="./my_project",
allowed_paths=["src/**", "*.py"]
)
patch_tool = ApplyPatchTool(
editor=editor,
needs_approval=True,
on_approval=lambda ctx, item: {"approve": True}
)
agent = ConversableAgent(
name="coding_assistant",
llm_config={"api_type": "responses", "model": "gpt-5.1", ...}
)
patch_tool.register_tool(agent)
Key Features
- Security: Path validation ensures operations stay within workspace and match allowed patterns
- Flexibility: Supports both sync (
apply_patch) and async (apply_patch_async) modes - Approval Workflow: Optional
needs_approvalparameter for user confirmation - Unified Diff Format: Uses standard diff format for precise file modifications
- Error Handling: Comprehensive error messages for failed operations
- Multi-Agent Support: Works in groupchat scenarios with multiple agents (see notebook examples)
Diff Format
The tool uses unified diff format. Example from the implementation:
@@ -1,3 +1,3 @@
def hello():
- print("World")
+ print("Hello, World!")
return True
GPT-5.1 generates these diffs automatically when using the apply_patch tool.
Integration Points
The integration touches several key areas:
- Type Safety (autogen/oai/openai_responses.py:55-63): New
ApplyPatchCallOutputdataclass - Client Extensions (autogen/oai/client.py): New config fields for workspace_dir and allowed_paths
- Message Normalization (autogen/oai/openai_responses.py:332-490): Processes apply_patch_call items in conversation flow
- Code Utils (autogen/code_utils.py): Enhanced to handle apply_patch operations
- Tests: Comprehensive test coverage in
test/tools/test_apply_patch_tool.pyandtest/oai/test_responses_client.py
Answering the Original Question
The ApplyPatchTool is used automatically when you:
- Set
api_type: "responses"withmodel: "gpt-5.1" - Include
"apply_patch"or"apply_patch_async"inbuilt_in_tools - Optionally configure
workspace_dirandallowed_paths
The tool doesn't need explicit registration in the simple case - the OpenAIResponsesClient automatically handles apply_patch_call items in the conversation and executes them via _apply_patch_operation().
The notebook (notebook/agentchat_gpt5.1_apply_patch_example.ipynb) demonstrates various usage scenarios including single-agent, two-agent, and groupchat examples with file creation, modification, and deletion operations.
@priyansh4320 Awesome feature! I will try review it asap.
@claude Review this PR.
Claude finished @randombet's task ββ View job
Code Review Complete β
I've completed a thorough review of PR #2213. This is a substantial feature addition that implements GPT-5.1's apply_patch tool support in AG2. Overall, the implementation is well-architected with good test coverage, but there are several issues that should be addressed before merging.
Summary
This PR adds support for OpenAI's Responses API apply_patch tool, enabling GPT-5.1 and codex models to create, update, and delete files using unified diffs. The implementation includes:
- New tool:
ApplyPatchToolwithWorkspaceEditorfor file operations (512 lines) - Client integration: Extended
OpenAIResponsesClientto handle apply_patch calls (334 additions) - Configuration: Added
workspace_dirandallowed_pathsto LLM config - Tests: Comprehensive test suite (520 + 1148 lines)
- Example: Jupyter notebook demonstrating usage (42 cells)
Critical Issues π΄
1. Debug Print Statement Left in Production Code
Location: autogen/oai/openai_responses.py:622
print("params", params)
Impact: This debug statement will spam logs in production.
Fix: Remove this line immediately.
2. Async Implementation Issues
Location: autogen/tools/apply_patch_tool.py:284-303
The async methods (a_create_file, a_update_file, a_delete_file) are identical to their sync counterparts - they don't actually perform any async I/O operations. They just call synchronous Path.write_text() and Path.read_text().
async def a_create_file(self, operation: dict[str, Any]) -> dict[str, Any]:
"""Create a new file."""
try:
# ... same code as create_file ...
full_path.write_text(content, encoding="utf-8") # β οΈ Still blocking!
Impact:
- Misleading API - users expect true async behavior
- Event loop blocking - defeats the purpose of async
- The complex thread-based workaround in
openai_responses.py:258-271wouldn't be needed if these were truly async
Fix: Either:
- Use
aiofilesfor true async file I/O, or - Remove the async variants and document that file operations are synchronous
3. Path Validation Logic Has Edge Cases
Location: autogen/tools/apply_patch_tool.py:214-261
The _validate_path method has problematic logic:
# Check if path matches any allowed pattern first (works for both local and cloud paths)
matches_any = False
for pattern in self.allowed_paths:
# ...
if fnmatch.fnmatch(path_normalized, pattern_normalized):
matches_any = True
break
if not matches_any:
raise ValueError(f"Path {path} is not allowed...")
# For local filesystem paths, validate they're within workspace
try:
full_path = (Path(self.workspace_dir) / path).resolve()
if not str(full_path).startswith(str(self.workspace_dir)):
raise ValueError(f"Path {path} is outside workspace directory")
return full_path
except (OSError, ValueError) as e:
# If path resolution fails, it might be a cloud storage path
if matches_any:
return Path(path) # β οΈ This won't work for actual file ops
raise ValueError(f"Path {path} is invalid: {str(e)}")
Issues:
- Cloud storage support is incomplete - The code claims to support cloud storage paths (S3, etc.) but returns a local
Pathobject that can't actually access cloud storage - Pattern matching uses
fnmatchinstead ofpathlib.Path.match-fnmatchdoesn't properly handle**for recursive matching (it treats it literally) - String prefix checking is unsafe - Using
str().startswith()for path validation is vulnerable to path traversal (e.g.,/workspacevs/workspace_evil)
Fix:
# Use pathlib for proper ** matching
from pathlib import PurePath
def _validate_path(self, path: str) -> Path:
# Check allowed patterns using pathlib (proper ** support)
matches_any = any(
PurePath(path).match(pattern)
for pattern in self.allowed_paths
)
if not matches_any:
raise ValueError(f"Path {path} not allowed by patterns")
# Validate within workspace
full_path = (self.workspace_dir / path).resolve()
# Use is_relative_to (Python 3.9+) or proper path comparison
try:
full_path.relative_to(self.workspace_dir)
except ValueError:
raise ValueError(f"Path {path} is outside workspace")
return full_path
And remove cloud storage claims from documentation unless you actually implement cloud storage support.
Major Issues π
4. Type Safety Violations
Location: Multiple places with # type: ignore[arg-type]
full_path = self._validate_path(path) # type: ignore[arg-type]
Issue: The type checkers are warning that path might be None (from operation.get("path")), but you're suppressing instead of handling.
Fix:
path = operation.get("path")
if not path:
return {"status": "failed", "output": "Missing 'path' in operation"}
diff = operation.get("diff", "")
full_path = self._validate_path(path)
5. Error Messages Expose Internal Paths
Location: All file operation methods
return {"status": "failed", "output": f"Error creating {path}: {str(e)}"}
Security Issue: Stack traces in str(e) might expose sensitive filesystem information to the LLM/user.
Fix: Sanitize error messages or use structured error types.
6. Diff Applier Edge Cases
Location: autogen/tools/apply_patch_tool.py:86-173
The _V4ADiffApplier is well-implemented overall, but has some edge cases:
def _consume_hunk_line(self, line: str) -> None:
if not line or line.startswith("\\ No newline"):
return
prefix = line[0] if line else " " # β οΈ line[0] after checking "if not line"?
payload = line[1:] if len(line) > 1 and prefix in "+- " else line
Issue: The logic checks if not line but then accesses line[0] in the same expression. The ternary makes prefix = " " when line is empty, but it's confusing.
Minor: Not a bug, just harder to read than necessary.
7. Missing Input Validation
Location: autogen/oai/openai_responses.py:_apply_patch_operation
The operation dict from the LLM is not validated before use:
operation = apply_patch_call.get("operation", {})
if operation:
output = self._apply_patch_operation(operation, ...)
Issue: No validation that operation["type"] is a valid operation, or that required fields exist.
Fix: Add validation:
VALID_OPERATIONS = {"create_file", "update_file", "delete_file"}
op_type = operation.get("type")
if op_type not in VALID_OPERATIONS:
return ApplyPatchCallOutput(
call_id=call_id,
status="failed",
output=f"Invalid operation type: {op_type}"
)
Code Quality Issues π‘
8. Inconsistent Return Types
Location: autogen/tools/apply_patch_tool.py:ApplyPatchTool.__init__
The handler function returns different shapes:
# Sometimes returns this format:
{"type": "apply_patch_call_output", "call_id": call_id, **result}
# But result itself is:
{"status": "completed", "output": "..."}
# So the final shape is:
{"type": "apply_patch_call_output", "call_id": call_id, "status": "completed", "output": "..."}
But in openai_responses.py, you also have an ApplyPatchCallOutput dataclass. The two representations should be unified.
9. Code Duplication
Locations:
autogen/oai/openai_responses.py:379-403vs405-428- Sync vs async file operations
Both blocks of code for processing message_apply_patch_calls vs previous_apply_patch_calls are nearly identical. This should be refactored:
def _process_patch_calls(self, patch_calls_dict, workspace_dir, allowed_paths, async_mode):
"""Process apply_patch_call items and return outputs."""
outputs = []
for call_id, apply_patch_call in patch_calls_dict.items():
operation = apply_patch_call.get("operation", {})
if operation:
output = self._apply_patch_operation(
operation, call_id, workspace_dir,
allowed_paths, async_patches=async_mode
)
outputs.append(output.to_dict())
return outputs
# Then use it:
message_outputs = self._process_patch_calls(
message_apply_patch_calls, workspace_dir, allowed_paths,
async_mode="apply_patch_async" in built_in_tools
)
10. Thread Safety Concerns
Location: autogen/oai/openai_responses.py:258-271
The _run_async_in_thread helper creates a new thread with a new event loop for every async operation:
def _run_async_in_thread(coro):
result = {}
def runner():
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
result["value"] = loop.run_until_complete(coro)
loop.close()
t = threading.Thread(target=runner)
t.start()
t.join()
return result["value"]
Issues:
- Creates a new thread for every file operation - expensive
- No timeout - could hang forever
- Using mutable dict for return value is a code smell
Fix: Since the file operations aren't truly async anyway (see issue #2), just make them synchronous.
11. Message Processing Logic is Complex
Location: autogen/oai/openai_responses.py:332-491
The _normalize_messages_for_responses_api method is 160 lines and handles multiple concerns:
- Extracting apply_patch_calls from messages
- Processing those calls
- Retrieving previous response calls
- Converting messages to Responses API format
- Handling image params
- Filtering processed calls
Recommendation: Break this into smaller, focused methods:
_extract_apply_patch_calls(messages) -> dict_execute_apply_patch_calls(calls_dict, ...) -> list_convert_messages_to_input(messages, ...) -> list
Minor Issues / Suggestions π‘
12. Documentation Improvements
Location: autogen/tools/apply_patch_tool.py:184-207
The docstring claims cloud storage support but it's not actually implemented:
"""File system editor for apply_patch operations.
Supports both local filesystem and cloud storage paths through allowed_paths patterns.
Fix: Remove cloud storage claims or implement it properly.
13. Test Coverage Gaps
While test coverage is good overall (77.2%), there are some gaps:
Missing coverage in autogen/code_utils.py: The new apply_patch_call handling in content_str() has 0% coverage (lines 86-93).
Test file: test/test_llm_config.py - 101 new lines but need to verify they test the new workspace_dir and allowed_paths fields.
14. Naming Inconsistency
You have both:
apply_patch(synchronous)apply_patch_async(asynchronous)
But the underlying tool methods are:
create_file/a_create_fileupdate_file/a_update_filedelete_file/a_delete_file
Consider making the naming consistent - either _async suffix or a_ prefix throughout.
15. Hardcoded Defaults
Location: autogen/tools/apply_patch_tool.py:212
self.allowed_paths = allowed_paths or ["**"]
Defaulting to ["**"] (allow all files) is insecure by default. Consider:
self.allowed_paths = allowed_paths or [] # Deny all by default
# Or at minimum, document the security implications clearly
16. Protocol Definition Could Be Simpler
Location: autogen/tools/apply_patch_tool.py:16-83
The PatchEditor Protocol defines both sync and async methods, but most implementations will only support one or the other. Consider:
class SyncPatchEditor(Protocol):
def create_file(...) -> dict: ...
def update_file(...) -> dict: ...
def delete_file(...) -> dict: ...
class AsyncPatchEditor(Protocol):
async def create_file(...) -> dict: ...
async def update_file(...) -> dict: ...
async def delete_file(...) -> dict: ...
Then let WorkspaceEditor implement both if needed.
Positive Aspects β
- Excellent test coverage - 520 lines of tool tests + 1148 lines of client tests
- Good separation of concerns - Tool layer separate from client integration
- Security-conscious - Path validation, allowed_paths patterns
- Good error handling - Operations return structured status/output dicts
- Well-documented - Docstrings with examples
- Comprehensive notebook example - Shows real usage patterns
- Type hints - Good use of type annotations throughout
- Backwards compatible - New features don't break existing code
Recommendations
Must Fix Before Merge:
- β Remove debug
print()statement (line 622) - β Fix path validation security issues (use
Path.relative_to()instead of string prefix) - β Fix or remove incomplete cloud storage support
- β Add input validation for operation types
Should Fix Before Merge:
- β οΈ Make async truly async or remove async variants
- β οΈ Fix type ignore comments with proper null checks
- β οΈ Refactor duplicated code in message processing
- β οΈ Add test coverage for
content_str()apply_patch_call handling
Nice to Have:
- π‘ Improve error message sanitization
- π‘ Reconsider default
allowed_paths = ["**"]security - π‘ Refactor long
_normalize_messages_for_responses_apimethod - π‘ Consider thread pool instead of one-thread-per-operation
Test Coverage Note
Codecov shows 77.2% patch coverage with 88 lines missing. The main gaps are:
autogen/oai/openai_responses.py: 32 missing + 13 partials (mostly async execution paths)autogen/tools/apply_patch_tool.py: 28 missing + 8 partials (error paths)autogen/code_utils.py: 6 missing + 1 partial (new apply_patch_call handling)
This is reasonable for a first implementation, but aim to increase coverage above 85% for critical security code like path validation.
Conclusion
This is a well-architected feature that adds valuable GPT-5.1 capabilities to AG2. The core design is solid, tests are comprehensive, and documentation is good. However, there are several critical issues (debug print, path security, async implementation) that must be addressed before merging.
The code quality is generally high, but would benefit from:
- Removing the debug print statement
- Fixing path validation security
- Clarifying async behavior (make it truly async or remove it)
- Adding input validation
- Reducing code duplication
Once these issues are addressed, this will be a great addition to AG2! π
Files reviewed:
autogen/tools/apply_patch_tool.py(512 lines) βautogen/oai/openai_responses.py(+334 lines) βautogen/oai/client.py(+9 lines) βautogen/code_utils.py(+8 lines) βtest/tools/test_apply_patch_tool.py(520 lines) βtest/oai/test_responses_client.py(+1148 lines) βnotebook/agentchat_gpt5.1_apply_patch_example.ipynb(42 cells) β
@priyansh4320 I think the PR is good. A few minor changes needs to be completed and a few follow-ups to be done. I want to highlight this feature in the next release.
Changes need to be done:
- Move ApplyPatchTool autogen/tools/apply_patch_tool.py to autogen/tools/experimental. We plan to only move user validated and widely-accepted tools to tools/ folder and we haven't worked on this yet.
- Update or remove Modify existing files, Refactor code, Bug fixing examples from current notebook/agentchat_gpt5.1_apply_patch_example.ipynb.
- Address code issues from claude code reviews.
Follow ups: Create a new PR for file system management, focus on file discovery and read, currently, you assume in the example that agent knows which files to update but we would like the agent to figure out by itself. You may find this useful https://github.com/massgen/MassGen/tree/main/massgen/filesystem_manager
@claude Review the PR again
Claude finished @randombet's task in 4m 33s ββ View job
Re-reviewing PR #2213 β
I've completed a comprehensive re-review of this PR following @randombet's request. The PR has made progress since the last review, but several critical issues remain.
Executive Summary
Status: β οΈ NOT READY FOR MERGE - Critical security and code quality issues remain
Progress: The author addressed ~30% of previous review issues (debug print, experimental folder move, cloud storage claims) but critical security and implementation issues remain unresolved.
Recommendation: Address the critical issues below before merging, particularly path validation security and fake async implementation.
Changes Since Last Review (Dec 6)
β Successfully Addressed:
- Debug print statement removed -
print("params", params)has been removed β - Tool moved to experimental - Now in
autogen/tools/experimental/apply_patch/as requested β - Documentation improved - Cloud storage claims toned down to "currently supports local filesystem" β
- CODEOWNERS added -
/autogen/tools/experimental/apply_patch/ @priyansh4320β
β Still Outstanding (Critical):
- Path validation security vulnerability - Still using unsafe string prefix check
- Fake async implementation - Async methods still use blocking I/O
- Type safety violations - Still suppressing type errors with
# type: ignore - Missing input validation - Operation types not validated
fnmatchdoesn't support**properly - Recursive glob matching broken- Code duplication - Message processing logic duplicated
Review Against @randombet's Specific Comments
Comment 1: CODEOWNERS in OSS Project
Location: .github/CODEOWNERS:7
@randombet's concern: "This is a OSS project. This seems to be a blocker to me."
Analysis: The CODEOWNERS addition is acceptable for an experimental feature:
- Adds
@priyansh4320as owner of/autogen/tools/experimental/apply_patch/ - Standard practice for new feature areas with primary contributors
- Also added to MAINTAINERS.md with "GPT-5 support, apply patch tool" scope
- Recommendation: β This is fine for an experimental tool, but @marklysze should confirm
Comment 2: Agent Talking to Itself
Location: notebook/agentchat_gpt5.1_apply_patch_example.ipynb:130
@randombet's concern: "coding_agent talks to itself? Should use run method"
Analysis: Found multiple instances of this pattern:
result = coding_agent.initiate_chat(
recipient=coding_agent, # β Agent talks to itself
message="...",
max_turns=2,
)
Locations: Lines 130, 179, 210, 240, 274, 332, 409, 431, 454, 509
Issue: This is anti-pattern in AG2. For single-agent tasks, should use:
coding_agent.generate_reply()for single responses- Or create a simple user proxy agent as recipient
Recommendation: β Must fix - Replace all self-chat patterns with proper AG2 patterns
Comment 3: Outdated Cell References
Location: notebook/agentchat_gpt5.1_apply_patch_example.ipynb:179
@randombet's concern: "Here and other places as well. Please update."
Analysis: The notebook contains example sections that reference code modifications, refactoring, and bug fixing (cells 10-18) that may not execute properly in sequence.
Recommendation: β οΈ Review and update the notebook to ensure all examples are self-contained or properly ordered
Comment 4: Documentation Location
Location: website/docs/user-guide/advanced-concepts/tools/experimentals/apply_patch_tool.mdx:2
@randombet's concern: "I don't think it is the right place to add experimental tool docs here. I think you can do either add those to the notebook or add readme to the apply_patch_tool folder."
Analysis:
- Current location:
website/docs/user-guide/advanced-concepts/tools/experimentals/ - This creates a new
experimentals/section in the docs site - For an experimental tool, this is not appropriate per @randombet
Recommendation: β Must fix - Options:
- Move documentation to
autogen/tools/experimental/apply_patch/README.md - Keep it only in the notebook
- Remove the
.mdxfile entirely and document in code/notebook
Critical Issues (Must Fix Before Merge)
π΄ Issue 1: Path Validation Security Vulnerability
File: autogen/tools/experimental/apply_patch/apply_patch_tool.py:244
Severity: CRITICAL - Security vulnerability
Problem:
if not str(full_path).startswith(str(self.workspace_dir)):
raise ValueError(f"Path {path} is outside workspace directory")
This is vulnerable to path traversal. Example attack:
workspace_dir = "/home/user/workspace"- Attacker path:
"../workspace_evil/secret.txt" str().startswith()check passes:"/home/user/workspace_evil/secret.txt".startswith("/home/user/workspace")β True!
Fix:
try:
full_path.relative_to(self.workspace_dir)
except ValueError:
raise ValueError(f"Path {path} is outside workspace")
Impact: Without this fix, the path security can be bypassed. This is a blocker.
π΄ Issue 2: Fake Async Implementation
Files: apply_patch_tool.py:272-291, 317-339, 356-369
Severity: HIGH - Misleading API
Problem: The async methods are not actually async:
async def a_create_file(self, operation: dict[str, Any]) -> dict[str, Any]:
# ...
full_path.write_text(content, encoding="utf-8") # β Blocking I/O!
This blocks the event loop, defeating the purpose of async. The workaround in openai_responses.py:258-271 creates a new thread for every operation, which is expensive and unnecessary.
Fix: Either:
- Use
aiofilesfor true async I/O:async def a_create_file(self, operation: dict[str, Any]) -> dict[str, Any]: import aiofiles async with aiofiles.open(full_path, 'w', encoding='utf-8') as f: await f.write(content) - Remove async variants entirely and document as sync-only
Impact: Current implementation is misleading and has unnecessary overhead. Should be fixed or removed.
π΄ Issue 3: fnmatch Doesn't Support ** Properly
File: apply_patch_tool.py:232-233
Severity: HIGH - Feature broken
Problem:
# Use fnmatch which properly supports ** for recursive matching
if fnmatch.fnmatch(path_normalized, pattern_normalized): # β FALSE - fnmatch treats ** literally!
The comment claims fnmatch supports ** but it doesn't. It treats ** as a literal string, not recursive glob.
Test:
import fnmatch
fnmatch.fnmatch("src/sub/file.py", "src/**") # Returns False!
Fix:
from pathlib import PurePath
def _validate_path(self, path: str) -> Path:
# Use pathlib for proper ** matching
matches_any = any(
PurePath(path).match(pattern)
for pattern in self.allowed_paths
)
Impact: The allowed_paths feature is broken for patterns like "src/**". This is advertised as a main feature but doesn't work.
π Issue 4: Type Safety Violations
Files: apply_patch_tool.py:257, 278, 299, 323, 345, 360
Severity: MEDIUM - Code quality
Problem: Multiple # type: ignore[arg-type] suppressions:
path = operation.get("path") # Could be None
full_path = self._validate_path(path) # type: ignore[arg-type]
Fix: Validate before passing:
path = operation.get("path")
if not path:
return {"status": "failed", "output": "Missing 'path' in operation"}
full_path = self._validate_path(path) # Now type-safe
Impact: Hides potential None-type errors. Should be fixed for robustness.
π Issue 5: Missing Operation Type Validation
File: openai_responses.py:256
Severity: MEDIUM - Input validation
Problem: Operation types from LLM are not validated:
op_type = operation.get("type") # Could be anything!
if op_type == "create_file": # No validation that it's valid
Fix:
VALID_OPERATIONS = {"create_file", "update_file", "delete_file"}
op_type = operation.get("type")
if op_type not in VALID_OPERATIONS:
return ApplyPatchCallOutput(
call_id=call_id,
status="failed",
output=f"Invalid operation type: {op_type}"
)
Code Quality Issues (Should Fix)
π‘ Issue 6: Notebook Self-Chat Pattern
Severity: MEDIUM - Anti-pattern
As noted by @randombet, the notebook uses coding_agent.initiate_chat(recipient=coding_agent) throughout, which is an anti-pattern. Should be:
# Option 1: Use generate_reply for single-turn
response = coding_agent.generate_reply(messages=[{"role": "user", "content": "..."}])
# Option 2: Create a user proxy
user = ConversableAgent(name="user", llm_config=False, human_input_mode="NEVER")
result = user.initiate_chat(recipient=coding_agent, message="...")
π‘ Issue 7: Documentation Location
Severity: MEDIUM - Per @randombet's feedback
The .mdx documentation file should be removed or moved to a README in the tool folder per @randombet's guidance.
π‘ Issue 8: Default allowed_paths = ["**"] is Insecure
File: apply_patch_tool.py:210, 412
Severity: MEDIUM - Security by default
Problem: Defaults to allowing all paths:
self.allowed_paths = allowed_paths if allowed_paths is not None else ["**"]
Recommendation: Consider defaulting to [] (deny all) or at least document the security implications prominently.
Test Coverage Analysis
Overall: 79% patch coverage (80 lines missing)
Gaps:
autogen/code_utils.py: 0% coverage for apply_patch_call handling (lines 86-93)autogen/oai/openai_responses.py: 32 missing + 13 partials (mostly async paths)autogen/tools/experimental/apply_patch/apply_patch_tool.py: 28 missing + 8 partials
Recommendation: Add tests for code_utils.py apply_patch_call handling.
Positive Aspects β
- Good refactoring - Tool moved to experimental as requested
- Debug code removed - Production-ready in that respect
- Comprehensive tests - 520 + 1041 lines of tests
- Well-documented - Good docstrings and examples
- Clean separation - Tool layer separate from client integration
- Type hints - Good use of type annotations
Summary of Required Changes
Before merging, must address:
- β Fix path validation security - Use
Path.relative_to()instead of string prefix - β Fix or remove fake async - Either implement true async or remove async variants
- β Fix
fnmatchglob matching - Usepathlib.Path.match()for proper**support - β Fix notebook self-chat pattern - Replace
recipient=coding_agentwith proper patterns - β Move or remove .mdx docs - Per @randombet's feedback, move to README or notebook
- β οΈ Add proper type safety - Remove type ignore comments with null checks
- β οΈ Add operation type validation - Validate operation types before use
Nice to have:
- Increase test coverage for
code_utils.py - Consider more secure default for
allowed_paths - Add input validation throughout
Conclusion
This PR adds valuable GPT-5.1 functionality but cannot be merged in its current state due to:
- Path validation security vulnerability (critical)
- Broken
**glob pattern matching (high - advertised feature doesn't work) - Misleading fake async implementation (high)
- Notebook anti-patterns (medium - per @randombet)
- Documentation location (medium - per @randombet)
The author has made good progress addressing previous feedback, but the critical issues above must be resolved before this can ship.
@priyansh4320 please address the Critical Issues and my comments.
@claude review changes : Before merging, must address:
β Fix path validation security - Use Path.relative_to() instead of string prefix β Fix or remove fake async - Either implement true async or remove async variants β Fix fnmatch glob matching - Use pathlib.Path.match() for proper ** support β Fix notebook self-chat pattern - Replace recipient=coding_agent with proper patterns (note: its a new pattern, part of my research keep it) β Move or remove .mdx docs - Per @randombet's feedback, move to README or notebook β οΈ Add proper type safety - Remove type ignore comments with null checks β οΈ Add operation type validation - Validate operation types before use
Claude finished @priyansh4320's task in 2m 14s ββ View job
Review Complete β
I've completed a thorough review of the changes made to address the critical issues. Here's the status:
β Successfully Fixed Issues
1. Path Validation Security - FIXED β
Location: apply_patch_tool.py:242-247
Now properly uses Path.relative_to() for secure path containment:
try:
full_path.relative_to(self.workspace_dir)
except ValueError:
raise ValueError(f"Path {path} is outside workspace")
This eliminates the path traversal vulnerability that existed with the string prefix check. Excellent work!
2. Glob Pattern Matching - FIXED β
Location: apply_patch_tool.py:233
Now uses PurePath.match() which properly supports ** for recursive matching:
matches_any = any(PurePath(path).match(pattern) for pattern in self.allowed_paths)
The fnmatch issue has been completely resolved. The ** pattern now works correctly.
3. Async Implementation - FIXED β
Locations: apply_patch_tool.py:277-303, 332-364, 384-406
The async implementation is now truly async using the aiofiles library:
- Line 13-16: Imports
aiofileswith proper error handling - Line 298-299:
async with aiofiles.open()andawait f.write()for async I/O - Line 352-353, 359-360: Async file reads and writes
- Line 402:
await aiofiles.os.remove()for async delete
Excellent improvement! The async methods now perform true non-blocking I/O operations. The runtime check (line 279-280, 334-335, 386-387) ensures users get a clear error if aiofiles isn't installed.
4. Type Safety - FIXED β
Locations: apply_patch_tool.py:256-258, 283-285, 308-310, 338-340, 369-371, 390-392
All the # type: ignore[arg-type] comments have been removed and replaced with proper null checks:
path = operation.get("path")
if not path:
return {"status": "failed", "output": "Missing 'path' in operation"}
Type safety is now properly enforced without suppressions. Great!
5. Operation Type Validation - FIXED β
Locations:
apply_patch_tool.py:409-410: DefinesVALID_OPERATIONS = {"create_file", "update_file", "delete_file"}apply_patch_tool.py:499-508: Validates operation type early in the handleropenai_responses.py:255-267: Validates operation type in the client integration
Both the tool and client now validate operation types before processing. Excellent!
6. Documentation Location - PARTIALLY ADDRESSED β οΈ
Status: README created but .mdx file still exists
- β
Good: Created comprehensive
autogen/tools/experimental/apply_patch/README.md(152 lines) - β Issue:
.mdxfile still exists atwebsite/docs/user-guide/advanced-concepts/tools/experimentals/apply_patch_tool.mdx
Per @randombet's feedback: "I don't think it is the right place to add experimental tool docs here. I think you can do either add those to the notebook or add readme to the apply_patch_tool folder."
Recommendation: Remove the .mdx file since you've created the README:
git rm website/docs/user-guide/advanced-concepts/tools/experimentals/apply_patch_tool.mdx
7. Notebook Self-Chat Pattern - KEPT (As Requested) β
Author's Note: "its a new pattern, part of my research keep it"
The notebook still uses coding_agent.initiate_chat(recipient=coding_agent) pattern. While this was flagged in the previous review as an anti-pattern, you've indicated this is intentional for research purposes. I'll respect that decision.
Note for reviewers: This pattern is unconventional in AG2 but is being kept per author's research needs.
Remaining Minor Issues
Issue A: Multiple Type Ignores Still Present
Locations: apply_patch_tool.py:524, 526, 528, 534, 536, 538
These are for the async handler routing:
result = await self.editor.a_create_file(operation) # type: ignore[union-attr]
Why they exist: The PatchEditor protocol defines both sync and async methods, but at runtime self.editor might only implement one set. The type checker can't know which methods are available.
Recommendation: This is acceptable given the protocol design. The runtime checks ensure the right methods are called. You could create separate SyncPatchEditor and AsyncPatchEditor protocols to avoid this, but the current approach is pragmatic.
Issue B: Documentation File Duplication
The .mdx file should be removed per @randombet's feedback.
Issue C: README Formatting Issue
Location: README.md:30, 67, 82, 122, 146
The README has missing newlines that cause formatting issues:
- Line 30: Missing newline before "For more information"
- Line 67: Missing newline before "## Configuration Options"
- Line 82: Missing newline before "All file operations"
- Line 122: Missing newline before "## Usage Examples"
- Line 146: Missing newline before "## Understanding Apply Patch"
Not critical, but should be fixed for better readability.
Code Quality Assessment
Excellent Improvements β
- Security: Path traversal vulnerability eliminated
- Functionality: Glob patterns now work correctly with
** - Performance: True async I/O implementation
- Type Safety: Proper null checks instead of suppressions
- Validation: Operation types validated early
- Error Handling: Clear error messages for missing
aiofiles
Architecture Quality β
- Clean separation of concerns
- Good error handling throughout
- Comprehensive docstrings
- Proper use of protocols
- Thread-based workaround for Jupyter (lines 269-282 in openai_responses.py) is reasonable
Test Coverage
Based on codecov report: 68.13% patch coverage (131 lines missing)
Main gaps:
apply_patch_tool.py: 72 missing + 9 partials (error paths, async variants)openai_responses.py: 41 missing + 9 partials (edge cases)
This is acceptable for experimental features, but consider adding tests for:
- Error paths (invalid paths, missing files, etc.)
- Async operations
- Path validation edge cases
Final Recommendations
Must Do Before Merge:
-
β Remove
.mdxfile per @randombet's feedback:git rm website/docs/user-guide/advanced-concepts/tools/experimentals/apply_patch_tool.mdx -
β οΈ Fix README formatting (add missing newlines)
Nice to Have:
- π‘ Consider increasing test coverage above 70%
- π‘ Address the CODEOWNERS concern if @marklysze has feedback
Conclusion
Overall Assessment: EXCELLENT WORK π
You've successfully addressed all 7 critical issues from the previous review:
- β Path validation security vulnerability fixed
- β
Async implementation now truly async with
aiofiles - β Glob pattern matching works correctly
- β Type safety improved with proper null checks
- β Operation type validation added
- β README created with comprehensive documentation
- β Notebook pattern kept per your research needs
The code quality has improved significantly. The only blocking issue is removing the .mdx file per @randombet's feedback. Once that's done and the README formatting is fixed, this PR should be ready for merge.
Remaining Action Items:
- Remove
.mdxdocumentation file - Fix README formatting (missing newlines)
- Address CODEOWNERS if needed (pending @marklysze feedback)
@claude re-review changes: Must Do Before Merge: β Remove .mdx file per @randombet's feedback:
git rm website/docs/user-guide/advanced-concepts/tools/experimentals/apply_patch_tool.mdx β οΈ Fix README formatting
Claude finished @priyansh4320's task in 1m 32s ββ View job
Re-Review Complete β
Verified the requested changes and all critical issues from the previous review.
Task Checklist:
- [x] Verify .mdx file was removed
- [x] Check README formatting fixes
- [x] Verify other critical issues were addressed
- [x] Post comprehensive re-review feedback
Summary of Verification
I've completed a thorough re-review of your changes. Here's what I found:
β Must Do Items - Status
1. Remove .mdx file β FIXED
- File
website/docs/user-guide/advanced-concepts/tools/experimentals/apply_patch_tool.mdxhas been successfully removed - Confirmed with
ls- file does not exist
2. Fix README formatting β οΈ PARTIALLY FIXED
- README exists at
autogen/tools/experimental/apply_patch/README.md - Content is comprehensive (150 lines)
- However, formatting issues remain:
- Line 30: Missing newline before "For more information"
- Line 67: Missing newline before "## Configuration Options"
- Line 84: Missing newline before "All file operations"
- Line 120: Missing newline before "## Usage Examples"
- Line 144: Missing newline before "## Understanding Apply Patch"
β All Critical Security & Code Quality Issues Fixed
I verified all the critical issues from the previous review, and I'm pleased to report all have been successfully addressed:
1. Path Validation Security - FIXED β
Location: apply_patch_tool.py:242-247
The path traversal vulnerability has been eliminated:
# Use Path.relative_to() for secure path containment check
# This prevents path traversal attacks that startswith() would miss
try:
full_path.relative_to(self.workspace_dir)
except ValueError:
raise ValueError(f"Path {path} is outside workspace directory")
Result: Path security is now robust and safe from traversal attacks.
2. Glob Pattern Matching - FIXED β
Location: apply_patch_tool.py:233
Now uses PurePath.match() which properly supports ** for recursive matching:
matches_any = any(PurePath(path).match(pattern) for pattern in self.allowed_paths)
Result: The ** pattern now works correctly for recursive directory matching. The advertised feature is fully functional.
3. Async Implementation - FIXED β
Locations: apply_patch_tool.py:13-16, 298-299, 352-353, 359-360, 402
The async implementation is now truly async using the aiofiles library:
- Proper
aiofilesimport with error handling (lines 13-16) async with aiofiles.open()andawait f.write()for non-blocking I/O (line 298-299)- Async file reads and writes throughout
await aiofiles.os.remove()for async file deletion (line 402)- Runtime check ensures users get clear error if
aiofilesisn't installed (lines 279-280, 334-335, 386-387)
Result: Async methods now perform true non-blocking I/O operations. No more fake async!
4. Type Safety - FIXED β
Locations: apply_patch_tool.py:256-258, 283-285, 308-310, 338-340, 369-371, 390-392
All the # type: ignore[arg-type] comments have been removed and replaced with proper null checks:
path = operation.get("path")
if not path:
return {"status": "failed", "output": "Missing 'path' in operation"}
Result: Type safety is now properly enforced without suppressions. Clean type checking.
5. Operation Type Validation - FIXED β
Locations:
apply_patch_tool.py:409-410: DefinesVALID_OPERATIONS = {"create_file", "update_file", "delete_file"}apply_patch_tool.py:499-508: Validates operation type early in the handleropenai_responses.py:255-267: Validates operation type in the client integration
Both the tool and client now validate operation types before processing:
# Validate operation type early
if op_type not in valid_operations:
return ApplyPatchCallOutput(
call_id=call_id,
status="failed",
output=f"Invalid operation type: {op_type}. Must be one of {valid_operations}",
)
Result: Input validation is comprehensive and prevents invalid operations from being processed.
Remaining Minor Issues
π README Formatting (As You Noted)
The README has missing newlines that cause formatting issues. Here are the specific locations:
-
Line 30:
pip install ag2[openai]For more information- Should be:
pip install ag2[openai]\n\nFor more information
- Should be:
-
Line 67: Missing newline before "## Configuration Options"
-
Line 84:
),\n)All file operations- Should have blank line before "All file operations"
-
Line 120: Missing newline before "## Usage Examples"
-
Line 144: Missing newline before "## Understanding Apply Patch"
These are minor but affect readability. Would you like me to fix these formatting issues?
Code Quality Assessment
Excellent Improvements β
- β Security: Path traversal vulnerability completely eliminated
- β
Functionality: Glob patterns now work correctly with
** - β
Performance: True async I/O implementation with
aiofiles - β Type Safety: Proper null checks instead of type suppressions
- β Validation: Operation types validated early in both tool and client
- β
Error Handling: Clear error messages for missing
aiofilesdependency
Architecture Quality β
- Clean separation of concerns
- Comprehensive error handling throughout
- Proper use of async/await patterns
- Security-conscious design with path validation
- Good documentation in docstrings
Final Recommendation
Status: READY FOR MERGE (after minor README formatting fix)
You've successfully addressed all 5 critical security and code quality issues from the previous review:
- β Path validation security vulnerability β FIXED
- β Async implementation (fake async) β FIXED with true async
- β
Glob pattern matching (
**support) β FIXED - β Type safety violations β FIXED
- β Operation type validation β FIXED
And the two items you specifically asked about:
- β
.mdxfile removed β CONFIRMED - β οΈ README formatting β Needs minor fixes (5 missing newlines)
Next Steps
Option 1: I can fix the README formatting for you (5 simple newline additions)
Option 2: You can fix it yourself - just add blank lines at the locations mentioned above
Once the README formatting is fixed, this PR will be ready for merge! Outstanding work on addressing all the critical issues. π
The example is still not as impressive but will merge this for now and revisit from shell tool PR.
Codecov Report
:x: Patch coverage is 76.28362% with 97 lines in your changes missing coverage. Please review.
| Files with missing lines | Coverage Ξ | |
|---|---|---|
| autogen/agentchat/groupchat.py | 66.17% <100.00%> (-3.90%) |
:arrow_down: |
| autogen/code_utils.py | 75.09% <100.00%> (+12.30%) |
:arrow_up: |
| autogen/llm_config/config.py | 92.65% <ΓΈ> (ΓΈ) |
|
| autogen/oai/client.py | 54.07% <100.00%> (-3.48%) |
:arrow_down: |
| autogen/tools/experimental/apply_patch/__init__.py | 100.00% <100.00%> (ΓΈ) |
|
| autogen/oai/openai_responses.py | 74.64% <74.84%> (+4.13%) |
:arrow_up: |
| ...tools/experimental/apply_patch/apply_patch_tool.py | 75.75% <75.75%> (ΓΈ) |
... and 16 files with indirect coverage changes
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.