mcp-context-forge icon indicating copy to clipboard operation
mcp-context-forge copied to clipboard

Correlation ID for Unified Request Tracking

Open shoummu1 opened this issue 3 weeks ago • 0 comments

📋 Summary

This PR delivers a comprehensive structured JSON logging pipeline that captures correlation IDs end-to-end (ingress middleware → services → persistence) while maintaining backward compatibility with legacy console/file logs. It introduces:

  • Correlation ID tracking: Extract, preserve, and generate unique request identifiers across the entire request lifecycle
  • Structured logging: Persist enriched logs to database with user context, performance metrics, and security indicators
  • Security & audit trails: Specialized loggers for authentication events, suspicious activity, and CRUD operations
  • Performance aggregation: Automatic rollup of logs into time-windowed metrics with percentiles
  • Admin UI enhancement: Rebuilt System Logs tab with search, correlation tracing, security events, and performance analytics

🔗 Related Issues

#300


🔧 Changes Made

Core Implementation

Correlation ID Infrastructure

  • New utility module (mcpgateway/utils/correlation_id.py): ContextVar-based correlation ID storage for async-safe request tracking across the entire request lifecycle
  • New middleware (mcpgateway/middleware/correlation_id.py): HTTP middleware for X-Correlation-ID header extraction, validation, generation, and injection into responses
  • Enhanced logging (mcpgateway/services/logging_service.py): CorrelationIdJsonFormatter for automatic correlation ID injection into JSON logs with OpenTelemetry trace context

Structured Logging & Observability

  • New structured logger (mcpgateway/services/structured_logger.py): Central logging facade that persists to database (StructuredLogEntry) with enriched metadata (user, component, operation type, duration)
  • New log aggregator (mcpgateway/services/log_aggregator.py): Aggregates structured logs into PerformanceMetric windows with percentiles (p50/p95/p99) and error rates
  • New security logger (mcpgateway/services/security_logger.py): Specialized logger for authentication attempts, suspicious activity, and threat scoring
  • New audit trail service (mcpgateway/services/audit_trail_service.py): CRUD operation tracking with change sets, data classification, and review flags

API & Admin UI

  • New log search router (mcpgateway/routers/log_search.py): RESTful endpoints for log search, correlation tracing, security events, audit trails, and performance metrics
  • Enhanced Admin UI (mcpgateway/static/admin.js, mcpgateway/templates/admin.html): System Logs tab rebuilt with quick actions, correlation trace modal, unified timeline view, and dynamic filters

Database Schema

  • New Alembic migration (mcpgateway/alembic/versions/k5e6f7g8h9i0_add_structured_logging_tables.py): Creates 4 new tables:
    • structured_log_entries: Comprehensive log storage with correlation IDs, user context, performance data, security indicators
    • performance_metrics: Time-windowed aggregations with percentile calculations
    • security_events: Threat analysis, failed attempt tracking, alert management
    • audit_trails: CRUD tracking with change detection and compliance metadata

⚙️ Configuration

New Settings in config.py:

  1. Correlation ID Settings (4 new fields):

    • correlation_id_enabled: Enable/disable correlation ID tracking (default: True)
    • correlation_id_header: Configurable header name (default: X-Correlation-ID)
    • correlation_id_preserve: Preserve client-provided IDs (default: True)
    • correlation_id_response_header: Echo correlation ID in responses (default: True)
  2. Structured Logging Settings (3 new fields):

    • structured_logging_enabled: Enable JSON logging with DB persistence (default: True)
    • structured_logging_database_enabled: Persist logs to database (default: True)
    • structured_logging_external_enabled: Send to external systems (default: False)
  3. Performance Tracking Settings (6 new fields):

    • performance_tracking_enabled: Enable performance metrics (default: True)
    • performance_threshold_*_ms: Alert thresholds for database queries, tool invocations, resource reads, HTTP requests
    • performance_degradation_multiplier: Alert threshold vs baseline (default: 1.5)
  4. Security Logging Settings (4 new fields):

    • security_logging_enabled: Enable security event logging (default: True)
    • security_failed_auth_threshold: Failed attempts before high severity (default: 5)
    • security_threat_score_alert: Threat score alert threshold (default: 0.7)
    • security_rate_limit_window_minutes: Rate limit check window (default: 5)
  5. Metrics Aggregation Settings (4 new fields):

    • metrics_aggregation_enabled: Enable automatic log aggregation (default: True)
    • metrics_aggregation_backfill_hours: Historical data to backfill on startup (default: 6)
    • metrics_aggregation_window_minutes: Aggregation window size (default: 5)
    • metrics_aggregation_auto_start: Auto-run aggregation loop (default: False)
  6. Log Search Settings (2 new fields):

    • log_search_max_results: Maximum results per query (default: 1000)
    • log_retention_days: Days to retain logs in database (default: 30)

Updated .env.example:

  • Added 4 new active Correlation ID settings (CORRELATION_ID_ENABLED, CORRELATION_ID_HEADER, CORRELATION_ID_PRESERVE, CORRELATION_ID_RESPONSE_HEADER)
  • Added 17 new commented examples for Structured Logging, Performance Tracking, Security Logging, Metrics Aggregation, and Log Search settings
  • All 21 settings are fully documented in config.py with Pydantic Field definitions and defaults

🔌 Integration Points

Middleware Stack (main.py):

  1. Registered CorrelationIDMiddleware after RequestLoggingMiddleware (execution order: RequestLogging → CorrelationID → Auth → Observability)
  2. Added background tasks for metrics aggregation backfill + continuous loop when metrics_aggregation_auto_start=True
  3. Included log_search router when structured_logging_enabled=True

Authentication & Security:

  1. auth.py: Enhanced JWT validation with correlation ID context
  2. middleware/auth_middleware.py: AuthContextMiddleware now logs successful/failed authentication attempts via SecurityLogger
  3. middleware/http_auth_middleware.py: Unified correlation ID usage across plugin auth hooks

Service Layer:

  1. services/tool_service.py: Integrated correlation ID fallback chain and structured logging for tool invocations
  2. services/resource_service.py: Added user context and audit logging for resource operations
  3. services/prompt_service.py: Enhanced with structured logging and performance tracking
  4. services/server_service.py: Integrated audit trails for server lifecycle events
  5. services/gateway_service.py: Added correlation ID propagation for federated requests
  6. services/a2a_service.py: Added correlation ID and user context to agent invocations

Observability:

  1. observability.py: Auto-inject correlation_id into OpenTelemetry spans as request.id attribute
  2. middleware/request_logging_middleware.py: Gateway boundary logging (request_started/completed) with correlation IDs, user resolution, and duration tracking
  3. admin.py: Plugin marketplace endpoints emit structured logs + audit trails for compliance

📁 New Files

  • mcpgateway/middleware/correlation_id.py – FastAPI middleware that extracts/preserves correlation IDs and injects them into responses
  • mcpgateway/utils/correlation_id.py – ContextVar utilities for generating, validating, and retrieving correlation IDs across async scopes
  • mcpgateway/services/structured_logger.py – Central structured logging facade that writes to JSON, DB, and optional external sinks
  • mcpgateway/services/log_aggregator.py – Aggregates StructuredLogEntry rows into PerformanceMetric windows and exposes helper APIs
  • mcpgateway/services/security_logger.py – Specialized logger for auth/suspicious events, computing threat scores and security audit entries
  • mcpgateway/services/audit_trail_service.py – Shared audit trail writer that records CRUD/data-access operations with change tracking
  • mcpgateway/routers/log_search.py – FastAPI router exposing /api/logs/search, /trace, /security-events, /audit-trails, /performance-metrics endpoints
  • mcpgateway/alembic/versions/k5e6f7g8h9i0_add_structured_logging_tables.py – Migration that creates structured_log_entries, performance_metrics, security_events, and audit_trails tables plus supporting indexes

Example Usage

curl -v http://localhost:4444/health

Full Response:

*   Trying 127.0.0.1:4444...
* Connected to localhost (127.0.0.1) port 4444 (#0)
> GET /health HTTP/1.1
> Host: localhost:4444
> User-Agent: curl/7.81.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< date: Thu, 27 Nov 2025 15:00:29 GMT
< server: uvicorn
< content-length: 20
< content-type: application/json
< x-content-type-options: nosniff
< x-frame-options: DENY
< x-xss-protection: 0
< x-download-options: noopen
< referrer-policy: strict-origin-when-cross-origin
< content-security-policy: default-src 'self'; script-src 'self' 'unsafe-inline' 'unsafe-eval' https://cdnjs.cloudflare.com https://cdn.tailwindcss.com https://cdn.jsdelivr.net https://unpkg.com; style-src 'self' 'unsafe-inline' https://cdnjs.cloudflare.com https://cdn.jsdelivr.net; img-src 'self' data: https:; font-src 'self' data: https://cdnjs.cloudflare.com; connect-src 'self' ws: wss: https:; frame-ancestors 'none';
< x-correlation-id: 6930e1f1a8b84beb904e18594bbf15dd
<
* Connection #0 to host localhost left intact
{"status":"healthy"}
  • Response header: x-correlation-id: 6930e1f1a8b84beb904e18594bbf15dd
  • Server logs: {"request_id": "6930e1f1a8b84beb904e18594bbf15dd", ...}

Correlation trace in Admin UI:

  1. Navigate to Admin UI → System Logs tab
  2. Click on correlation ID to Trace the correlation ID
  3. Enter correlation ID or paste from search box
  4. View unified timeline with all logs, security events, audit trails, and performance metrics for that request

shoummu1 avatar Nov 14 '25 11:11 shoummu1