mcp-context-forge
mcp-context-forge copied to clipboard
Correlation ID for Unified Request Tracking
📋 Summary
This PR delivers a comprehensive structured JSON logging pipeline that captures correlation IDs end-to-end (ingress middleware → services → persistence) while maintaining backward compatibility with legacy console/file logs. It introduces:
- Correlation ID tracking: Extract, preserve, and generate unique request identifiers across the entire request lifecycle
- Structured logging: Persist enriched logs to database with user context, performance metrics, and security indicators
- Security & audit trails: Specialized loggers for authentication events, suspicious activity, and CRUD operations
- Performance aggregation: Automatic rollup of logs into time-windowed metrics with percentiles
- Admin UI enhancement: Rebuilt System Logs tab with search, correlation tracing, security events, and performance analytics
🔗 Related Issues
#300
🔧 Changes Made
Core Implementation
Correlation ID Infrastructure
- New utility module (
mcpgateway/utils/correlation_id.py): ContextVar-based correlation ID storage for async-safe request tracking across the entire request lifecycle - New middleware (
mcpgateway/middleware/correlation_id.py): HTTP middleware for X-Correlation-ID header extraction, validation, generation, and injection into responses - Enhanced logging (
mcpgateway/services/logging_service.py): CorrelationIdJsonFormatter for automatic correlation ID injection into JSON logs with OpenTelemetry trace context
Structured Logging & Observability
- New structured logger (
mcpgateway/services/structured_logger.py): Central logging facade that persists to database (StructuredLogEntry) with enriched metadata (user, component, operation type, duration) - New log aggregator (
mcpgateway/services/log_aggregator.py): Aggregates structured logs into PerformanceMetric windows with percentiles (p50/p95/p99) and error rates - New security logger (
mcpgateway/services/security_logger.py): Specialized logger for authentication attempts, suspicious activity, and threat scoring - New audit trail service (
mcpgateway/services/audit_trail_service.py): CRUD operation tracking with change sets, data classification, and review flags
API & Admin UI
- New log search router (
mcpgateway/routers/log_search.py): RESTful endpoints for log search, correlation tracing, security events, audit trails, and performance metrics - Enhanced Admin UI (
mcpgateway/static/admin.js,mcpgateway/templates/admin.html): System Logs tab rebuilt with quick actions, correlation trace modal, unified timeline view, and dynamic filters
Database Schema
- New Alembic migration (
mcpgateway/alembic/versions/k5e6f7g8h9i0_add_structured_logging_tables.py): Creates 4 new tables:structured_log_entries: Comprehensive log storage with correlation IDs, user context, performance data, security indicatorsperformance_metrics: Time-windowed aggregations with percentile calculationssecurity_events: Threat analysis, failed attempt tracking, alert managementaudit_trails: CRUD tracking with change detection and compliance metadata
⚙️ Configuration
New Settings in config.py:
-
Correlation ID Settings (4 new fields):
correlation_id_enabled: Enable/disable correlation ID tracking (default:True)correlation_id_header: Configurable header name (default:X-Correlation-ID)correlation_id_preserve: Preserve client-provided IDs (default:True)correlation_id_response_header: Echo correlation ID in responses (default:True)
-
Structured Logging Settings (3 new fields):
structured_logging_enabled: Enable JSON logging with DB persistence (default:True)structured_logging_database_enabled: Persist logs to database (default:True)structured_logging_external_enabled: Send to external systems (default:False)
-
Performance Tracking Settings (6 new fields):
performance_tracking_enabled: Enable performance metrics (default:True)performance_threshold_*_ms: Alert thresholds for database queries, tool invocations, resource reads, HTTP requestsperformance_degradation_multiplier: Alert threshold vs baseline (default:1.5)
-
Security Logging Settings (4 new fields):
security_logging_enabled: Enable security event logging (default:True)security_failed_auth_threshold: Failed attempts before high severity (default:5)security_threat_score_alert: Threat score alert threshold (default:0.7)security_rate_limit_window_minutes: Rate limit check window (default:5)
-
Metrics Aggregation Settings (4 new fields):
metrics_aggregation_enabled: Enable automatic log aggregation (default:True)metrics_aggregation_backfill_hours: Historical data to backfill on startup (default:6)metrics_aggregation_window_minutes: Aggregation window size (default:5)metrics_aggregation_auto_start: Auto-run aggregation loop (default:False)
-
Log Search Settings (2 new fields):
log_search_max_results: Maximum results per query (default:1000)log_retention_days: Days to retain logs in database (default:30)
Updated .env.example:
- Added 4 new active Correlation ID settings (CORRELATION_ID_ENABLED, CORRELATION_ID_HEADER, CORRELATION_ID_PRESERVE, CORRELATION_ID_RESPONSE_HEADER)
- Added 17 new commented examples for Structured Logging, Performance Tracking, Security Logging, Metrics Aggregation, and Log Search settings
- All 21 settings are fully documented in config.py with Pydantic Field definitions and defaults
🔌 Integration Points
Middleware Stack (main.py):
- Registered CorrelationIDMiddleware after RequestLoggingMiddleware (execution order: RequestLogging → CorrelationID → Auth → Observability)
- Added background tasks for metrics aggregation backfill + continuous loop when
metrics_aggregation_auto_start=True - Included log_search router when
structured_logging_enabled=True
Authentication & Security:
auth.py: Enhanced JWT validation with correlation ID contextmiddleware/auth_middleware.py: AuthContextMiddleware now logs successful/failed authentication attempts via SecurityLoggermiddleware/http_auth_middleware.py: Unified correlation ID usage across plugin auth hooks
Service Layer:
services/tool_service.py: Integrated correlation ID fallback chain and structured logging for tool invocationsservices/resource_service.py: Added user context and audit logging for resource operationsservices/prompt_service.py: Enhanced with structured logging and performance trackingservices/server_service.py: Integrated audit trails for server lifecycle eventsservices/gateway_service.py: Added correlation ID propagation for federated requestsservices/a2a_service.py: Added correlation ID and user context to agent invocations
Observability:
observability.py: Auto-inject correlation_id into OpenTelemetry spans asrequest.idattributemiddleware/request_logging_middleware.py: Gateway boundary logging (request_started/completed) with correlation IDs, user resolution, and duration trackingadmin.py: Plugin marketplace endpoints emit structured logs + audit trails for compliance
📁 New Files
mcpgateway/middleware/correlation_id.py– FastAPI middleware that extracts/preserves correlation IDs and injects them into responsesmcpgateway/utils/correlation_id.py– ContextVar utilities for generating, validating, and retrieving correlation IDs across async scopesmcpgateway/services/structured_logger.py– Central structured logging facade that writes to JSON, DB, and optional external sinksmcpgateway/services/log_aggregator.py– Aggregates StructuredLogEntry rows into PerformanceMetric windows and exposes helper APIsmcpgateway/services/security_logger.py– Specialized logger for auth/suspicious events, computing threat scores and security audit entriesmcpgateway/services/audit_trail_service.py– Shared audit trail writer that records CRUD/data-access operations with change trackingmcpgateway/routers/log_search.py– FastAPI router exposing/api/logs/search,/trace,/security-events,/audit-trails,/performance-metricsendpointsmcpgateway/alembic/versions/k5e6f7g8h9i0_add_structured_logging_tables.py– Migration that createsstructured_log_entries,performance_metrics,security_events, andaudit_trailstables plus supporting indexes
Example Usage
curl -v http://localhost:4444/health
Full Response:
* Trying 127.0.0.1:4444...
* Connected to localhost (127.0.0.1) port 4444 (#0)
> GET /health HTTP/1.1
> Host: localhost:4444
> User-Agent: curl/7.81.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< date: Thu, 27 Nov 2025 15:00:29 GMT
< server: uvicorn
< content-length: 20
< content-type: application/json
< x-content-type-options: nosniff
< x-frame-options: DENY
< x-xss-protection: 0
< x-download-options: noopen
< referrer-policy: strict-origin-when-cross-origin
< content-security-policy: default-src 'self'; script-src 'self' 'unsafe-inline' 'unsafe-eval' https://cdnjs.cloudflare.com https://cdn.tailwindcss.com https://cdn.jsdelivr.net https://unpkg.com; style-src 'self' 'unsafe-inline' https://cdnjs.cloudflare.com https://cdn.jsdelivr.net; img-src 'self' data: https:; font-src 'self' data: https://cdnjs.cloudflare.com; connect-src 'self' ws: wss: https:; frame-ancestors 'none';
< x-correlation-id: 6930e1f1a8b84beb904e18594bbf15dd
<
* Connection #0 to host localhost left intact
{"status":"healthy"}
- Response header:
x-correlation-id: 6930e1f1a8b84beb904e18594bbf15dd - Server logs:
{"request_id": "6930e1f1a8b84beb904e18594bbf15dd", ...}
Correlation trace in Admin UI:
- Navigate to Admin UI → System Logs tab
- Click on correlation ID to Trace the correlation ID
- Enter correlation ID or paste from search box
- View unified timeline with all logs, security events, audit trails, and performance metrics for that request