ag2 icon indicating copy to clipboard operation
ag2 copied to clipboard

Add interactive WebSocket UI and FastAPI backend for Groupchat

Open Suryaaa-Rathore opened this issue 4 months ago โ€ข 12 comments

This PR introduces a lightweight, interactive WebSocket UI along with a FastAPI backend.

What's included:

  • Frontend HTML/JS UI for real-time WebSocket communication
  • FastAPI backend with WebSocket manager
  • Group chat orchestration using a class-based AgentChat setup
  • Streaming agent responses with message formatting
  • Support for user input during active chats (User Proxy agent)
  • Environment-based API key configuration
  • Basic session management (create/use chat_id)

Why: We were facing limitations with tools like Postman for WebSocket testing โ€” including lack of message formatting, session handling, and poor UX for prompt tuning. This setup improves debugging, speeds up development, and simplifies agent alignment tasks.

Suryaaa-Rathore avatar Aug 22 '25 13:08 Suryaaa-Rathore

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Aug 22 '25 13:08 CLAassistant

@Suryaaa-Rathore very interesting PR! Thanks for the contribution! Could you run pre-commit: https://docs.ag2.ai/latest/docs/contributor-guide/pre-commit/

This is a large PR. Will start review soon but may take some time!

qingyun-wu avatar Aug 23 '25 04:08 qingyun-wu

@qingyun-wu Thanks! I've run the pre-commit and made the necessary changes.

Suryaaa-Rathore avatar Aug 23 '25 08:08 Suryaaa-Rathore

@Suryaaa-Rathore do you have discord? I am working on something similar to this and would love to chat. What I am currently trying to figure out is persistence after disconnection of the websocket. My discord is: elcapitan__ (thats 2 underscores)

elCaptnCode avatar Aug 25 '25 14:08 elCaptnCode

@Suryaaa-Rathore does this support the typewriter effect (sending partial chunks/tokens and incrementally rendering them)?

elCaptnCode avatar Aug 27 '25 17:08 elCaptnCode

@BlocUnited Yes, it can support the typewriter effect, but the current PR does not include it yet.

Suryaaa-Rathore avatar Aug 28 '25 09:08 Suryaaa-Rathore

@Suryaaa-Rathore

  • I suggest to move out agentchat_fastapi_websocket from notebooks -> agents/experimental
  • also lets rename the folder to agents_fastapi_websocket
  • let's move the UI to a separate PR and keep UI as a seperate folder entirely under root ag2/websocket_ui for example.

priyansh4320 avatar Sep 24 '25 14:09 priyansh4320

Code Review for PR 2062

See full review details below:

CRITICAL SECURITY: CORS wide open, no API key validation, code exec without Docker, path traversal risks CRITICAL CODE: Silent exceptions, threading.Lock in async, event loop blocking, unbounded queues MODERATE: No tests, missing docs, hardcoded values, no input validation

Recommendations: Fix security issues, add error handling/logging, add tests, make values configurable

Status: CHANGES REQUESTED

Full detailed review available on request - Claude Code

github-actions[bot] avatar Nov 26 '25 20:11 github-actions[bot]

DETAILED REVIEW PART 1 - Security Issues

  1. CORS Configuration main.py:25

    • Issue: allow_origins=[*] with allow_credentials=True is a security vulnerability
    • Fix: Make CORS configurable via env vars, provide secure defaults
  2. API Key Validation groupchat.py:69

    • Issue: os.environ[OPENAI_API_KEY] raises KeyError if not set
    • Fix: Use os.environ.get() with proper validation and error message
  3. Code Execution Security groupchat.py:102-105

    • Issue: use_docker=False allows arbitrary code execution on host
    • Fix: Enable Docker by default or add prominent security warnings
  4. Path Traversal main.py:76

    • Issue: Relative path could read unexpected files
    • Fix: Use Path(file).parent / templates / index.html
  5. Resource Exhaustion connection.py:52

    • Issue: Unbounded asyncio.Queue() allows memory exhaustion
    • Fix: Set maxsize parameter, e.g. asyncio.Queue(maxsize=100)

github-actions[bot] avatar Nov 26 '25 20:11 github-actions[bot]

DETAILED REVIEW PART 2 - Code Quality Issues

  1. Error Handling Problems

    • dependencies.py:32-34, 45-47: Bare except without logging
    • connection.py:64-65: Returns False without logging errors
    • ws.py:71-74: Catches and re-raises without context
    • Fix: Add proper logging before re-raising exceptions
  2. Concurrency Issues

    • cancellation_token.py:12: Uses threading.Lock in async code
    • Fix: Replace with asyncio.Lock for async contexts
    • connection.py:307-318: Race conditions in disconnect method
    • Fix: Ensure atomic operations or use locks
  3. Async/Sync Mixing Anti-pattern helpers.py:16

    • Uses deprecated asyncio.get_event_loop()
    • Blocks event loop with run_until_complete()
    • Used in groupchat.py:47 within async context
    • Fix: Remove helper, refactor _print_received_message to be async
  4. Performance Issues

    • main.py:76: Sync file read blocks event loop on every request
    • Fix: Load template once at startup
    • connection.py:172-186: Polling creates timeout exceptions every 5s
    • Fix: Use event-based approach

github-actions[bot] avatar Nov 26 '25 20:11 github-actions[bot]

DETAILED REVIEW PART 3 - Testing and Documentation

  1. Testing - COMPLETELY MISSING No unit tests, integration tests, or security tests

    Recommended test coverage:

    • CancellationToken edge cases and thread safety
    • WebSocketManager session lifecycle and cleanup
    • Multi-session concurrent handling
    • Error scenarios and recovery
    • Input validation boundaries
    • Resource exhaustion scenarios

    Target: Minimum 70 percent coverage

  2. Documentation Gaps

    • Missing docstrings for CustomGroupChatManager and CustomUserProxyAgent
    • No security warnings in README about code execution risks
    • No production deployment guide or best practices
    • requirements.txt pins specific versions without explanation
  3. Configuration Issues

    • Hardcoded host 0.0.0.0 and port 8000
    • Hardcoded model gpt-4o
    • Hardcoded max_round equals 500
    • Fix: Make all values configurable via environment variables
  4. Input Validation - MISSING

    • No validation of task content length or format
    • No validation of response content
    • No rate limiting or request throttling
    • Potential for DoS attacks

github-actions[bot] avatar Nov 26 '25 20:11 github-actions[bot]

DETAILED REVIEW PART 4 - Summary and Recommendations

ISSUE SUMMARY:

  • Critical Security Issues: 5
  • Critical Code Issues: 7
  • Moderate Issues: 8
  • Code Cleanup: 5

PRIORITY ACTIONS REQUIRED:

  1. Fix CORS configuration - make configurable with secure defaults
  2. Add API key validation with helpful error messages
  3. Document or enable Docker for code execution security
  4. Fix all error handling - add logging throughout
  5. Replace threading.Lock with asyncio.Lock
  6. Remove async_to_sync anti-pattern - refactor to proper async
  7. Add input validation and rate limiting
  8. Add comprehensive test suite with 70+ percent coverage
  9. Make all configuration values environment-based
  10. Add security warnings to documentation

STRENGTHS OF THIS PR:

  • Good architectural separation of concerns
  • Comprehensive user-facing documentation
  • Proper type annotations throughout
  • Correctly placed in experimental directory
  • Addresses real developer pain points

VERDICT: This is valuable and useful functionality that solves real problems. However, it requires significant improvements in security, error handling, testing, and code quality before it can be safely merged.

The core concept is solid and the documentation shows good understanding of the use case. With the recommended changes, this will be an excellent addition to the experimental features.

Recommended next steps:

  1. Address all critical security issues first
  2. Add proper error handling and logging
  3. Write comprehensive tests
  4. Update documentation with security considerations
  5. Request re-review once changes are complete

Happy to provide more specific guidance on any of these items if needed.

github-actions[bot] avatar Nov 26 '25 20:11 github-actions[bot]

@Suryaaa-Rathore can you please follow up on this PR?

priyansh4320 avatar Dec 15 '25 20:12 priyansh4320

๐Ÿ” Re-Review of PR #2062 - WebSocket UI and FastAPI Backend

I've conducted a comprehensive re-review of this PR, considering the previous feedback from November 2024. This review focuses on the current state of the code and identifies both improvements made and remaining issues.


๐Ÿ“Š Summary Assessment

Overall Status: CHANGES REQUESTED โš ๏ธ

This PR provides valuable functionality for WebSocket-based agent interaction, but still requires significant improvements in security, error handling, testing, and code quality before merging.

Progress Since Last Review: โœ… Type annotations added throughout โœ… Code moved to experimental directory โŒ Most critical issues from previous review remain unaddressed


๐Ÿšจ CRITICAL SECURITY ISSUES

1. CORS Configuration - HIGH RISK (main.py:25)

Middleware(CORSMiddleware, allow_origins=["*"], allow_credentials=True, ...)

Issue: Wide-open CORS with credentials enabled is a security vulnerability Impact: Allows any website to make authenticated requests to your API Fix Required:

  • Make CORS origins configurable via environment variables
  • Provide secure defaults (localhost for development)
  • Document security implications in README

2. API Key Handling - HIGH RISK (groupchat.py:69)

api_key=os.environ["OPENAI_API_KEY"]

Issue: Direct environment access raises KeyError if not set, crashes entire application Impact: Poor user experience and potential service disruption Fix Required:

api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
    raise ValueError("OPENAI_API_KEY environment variable is required")

3. Code Execution Security - CRITICAL (groupchat.py:102-105)

code_execution_config={
    "work_dir": os.getcwd(),
    "use_docker": False,
}

Issue: Arbitrary code execution on host without Docker isolation Impact: Malicious code can access the entire system Fix Required:

  • Enable Docker by default (use_docker=True)
  • Add prominent security warnings in README
  • Make configurable via environment variable
  • Document the risks of disabling Docker

4. Path Traversal Risk (main.py:76)

html_content = Path("templates/index.html").read_text(encoding="utf-8")

Issue: Relative path resolution can be unpredictable Impact: May serve wrong file depending on execution context Fix Required:

template_path = Path(__file__).parent / "templates" / "index.html"
html_content = template_path.read_text(encoding="utf-8")

5. Resource Exhaustion (connection.py:52)

self._input_responses[session_id] = asyncio.Queue()

Issue: Unbounded queue allows memory exhaustion attacks Impact: DoS vulnerability Fix Required:

self._input_responses[session_id] = asyncio.Queue(maxsize=100)

โš ๏ธ CRITICAL CODE QUALITY ISSUES

6. Silent Exception Handling - MULTIPLE LOCATIONS

dependencies.py:32-34, 45-47:

except Exception:
    await cleanup_managers()
    raise

Issue: Silent exceptions without logging make debugging impossible Fix Required: Add logging before re-raising

connection.py:64-65:

except Exception:
    return False

Issue: Swallows errors completely Fix Required: Log the exception before returning False

ws.py:71-74:

except WebSocketDisconnect:
    raise
except Exception:
    raise

Issue: Catches and re-raises without adding context Fix Required: Add logging or remove unnecessary try-catch

7. Threading/AsyncIO Mismatch - CRITICAL (cancellation_token.py:12)

self._lock: threading.Lock = threading.Lock()

Issue: Using threading.Lock in async code can cause deadlocks Impact: Potential deadlocks and race conditions Fix Required:

import asyncio
self._lock: asyncio.Lock = asyncio.Lock()
# Update all 'with self._lock:' to 'async with self._lock:'

8. Async/Sync Anti-Pattern - CRITICAL (helpers.py:16)

def async_to_sync(awaitable: Awaitable[Any]) -> Any:
    loop = asyncio.get_event_loop()
    return loop.run_until_complete(awaitable)

Issues:

  • asyncio.get_event_loop() is deprecated
  • Blocks event loop when called from async context (groupchat.py:47)
  • Can cause deadlocks in async environments

Impact: Can freeze the entire application Fix Required: Remove this helper entirely and refactor _print_received_message to be async

9. Event Loop Blocking (main.py:76)

html_content = Path("templates/index.html").read_text(encoding="utf-8")

Issue: Synchronous file I/O blocks event loop on every request Fix Required: Load template once at startup:

# At module level or in lifespan startup
TEMPLATE_CONTENT = Path(__file__).parent.joinpath("templates/index.html").read_text()

@app.get("/")
async def serve_html() -> HTMLResponse:
    return HTMLResponse(content=TEMPLATE_CONTENT)

10. Race Condition in Disconnect (connection.py:307-318)

Issue: Multiple operations without atomic guarantees Example Scenario:

  1. Thread A calls disconnect()
  2. Thread B calls send_message()
  3. Race condition on _closed_connections set

Fix Required: Ensure atomic operations or use proper locking


๐Ÿ“ MODERATE ISSUES

11. Missing Input Validation

Locations: ws.py, connection.py Issues:

  • No validation of task content length
  • No sanitization of user input
  • No rate limiting
  • Potential for DoS attacks

Fix Required:

  • Add content length limits
  • Implement rate limiting per session
  • Validate message structure

12. Hardcoded Configuration Values

Locations:

  • main.py:99 - Host 0.0.0.0, Port 8000
  • groupchat.py:69 - Model gpt-4o
  • groupchat.py:196 - max_round=500

Fix Required: Make all configurable via environment variables

13. Incomplete Error Context (main.py:40-41, 50-51)

except Exception:
    raise

Issue: Bare except/raise without context Fix Required: Add specific exception handling with proper messages

14. Deprecated datetime Usage (ws.py:48, 57, 68)

datetime.utcnow().isoformat()

Issue: utcnow() is deprecated Fix Required:

from datetime import timezone
datetime.now(timezone.utc).isoformat()

๐Ÿงช TESTING - COMPLETELY MISSING

Critical Gap: Zero test coverage

Required Test Coverage:

  1. Unit Tests:

    • CancellationToken thread safety
    • WebSocketManager session lifecycle
    • AgentChat initialization and configuration
    • Error handling paths
  2. Integration Tests:

    • End-to-end WebSocket communication
    • Multi-session concurrent handling
    • Cancellation and cleanup flows
  3. Security Tests:

    • Input validation boundaries
    • Resource exhaustion scenarios
    • API key handling

Recommendation: Minimum 70% code coverage before merge


๐Ÿ“š DOCUMENTATION GAPS

  1. Security Warnings Missing:

    • No mention of code execution risks with use_docker=False
    • No deployment security best practices
    • No warning about CORS configuration
  2. Missing Docstrings:

    • CustomGroupChatManager class
    • CustomUserProxyAgent class
    • Several methods lack documentation
  3. Configuration Guide Needed:

    • Environment variable reference
    • Production deployment guide
    • Security hardening checklist
  4. Requirements.txt:

    • Pinned versions without explanation
    • No information about version constraints

๐Ÿ”ง CODE CLEANUP RECOMMENDATIONS

Minor Issues:

  1. Unused import comment (main.py:17): # from routers import apis - Remove
  2. Inconsistent comment (dependencies.py:10): # Singleton instance of MongoDB service - Wrong comment
  3. Unused variable (groupchat.py:12): today_date = datetime.now() - Remove if not used
  4. Missing type annotation (groupchat.py:140): Return type should be Agent | None
  5. Polling inefficiency (connection.py:172-186): Creates timeout exceptions every 5s - use event-based approach

โœจ POSITIVE ASPECTS

This PR has several strengths worth acknowledging:

โœ… Good architectural separation - Clean separation of concerns
โœ… Comprehensive documentation - Excellent user-facing guides
โœ… Type annotations - Proper typing throughout
โœ… Solves real problems - Addresses actual developer pain points
โœ… Appropriate location - Correctly placed in experimental directory


๐ŸŽฏ PRIORITY ACTION ITEMS

Must Fix Before Merge (Priority Order):

  1. Fix CORS configuration - Make configurable, document security implications
  2. Fix API key validation - Use os.getenv() with proper error messages
  3. Address code execution security - Enable Docker or add prominent warnings
  4. Replace threading.Lock with asyncio.Lock - Fix async/threading mismatch
  5. Remove async_to_sync anti-pattern - Refactor to proper async
  6. Add comprehensive error logging - All exception handlers need logging
  7. Add input validation and rate limiting - Prevent DoS attacks
  8. Write test suite - Minimum 70% coverage
  9. Make configuration environment-based - All hardcoded values
  10. Add security documentation - Warnings and best practices

Recommended Improvements:

  1. Load HTML template at startup (performance)
  2. Add proper logging throughout
  3. Add missing docstrings
  4. Fix deprecated datetime.utcnow() usage
  5. Review and fix race conditions

๐Ÿ“‹ VERDICT

Status: CHANGES REQUESTED

This PR introduces valuable functionality that addresses real developer needs. The WebSocket UI for agent interaction is well-designed and the documentation is excellent. However, the security issues and code quality problems are too significant to merge in the current state.

The core concept is solid and with the recommended changes, this will be an excellent addition to the experimental features.

Next Steps:

  1. โœ… Address all CRITICAL security issues (items 1-5)
  2. โœ… Fix all CRITICAL code quality issues (items 6-10)
  3. โœ… Add comprehensive test suite
  4. โœ… Update documentation with security considerations
  5. โœ… Request re-review once changes are complete

I'm happy to provide more specific guidance on any of these items. The work you've done so far is valuable - it just needs these critical improvements before it's production-ready (even for experimental features).


Reviewed by: Claude Sonnet 4.5 via Claude Code Review Date: 2025-12-15 Previous Review: 2024-11-26

github-actions[bot] avatar Dec 15 '25 20:12 github-actions[bot]