mcp-context-forge Correlation ID for Unified Request Tracking

Correlation ID for Unified Request Tracking

Open shoummu1 opened this issue 3 weeks ago • 0 comments

📋 Summary

This PR delivers a comprehensive structured JSON logging pipeline that captures correlation IDs end-to-end (ingress middleware → services → persistence) while maintaining backward compatibility with legacy console/file logs. It introduces:

Correlation ID tracking: Extract, preserve, and generate unique request identifiers across the entire request lifecycle
Structured logging: Persist enriched logs to database with user context, performance metrics, and security indicators
Security & audit trails: Specialized loggers for authentication events, suspicious activity, and CRUD operations
Performance aggregation: Automatic rollup of logs into time-windowed metrics with percentiles
Admin UI enhancement: Rebuilt System Logs tab with search, correlation tracing, security events, and performance analytics

🔗 Related Issues

#300

🔧 Changes Made

Core Implementation

Correlation ID Infrastructure

New utility module (mcpgateway/utils/correlation_id.py): ContextVar-based correlation ID storage for async-safe request tracking across the entire request lifecycle
New middleware (mcpgateway/middleware/correlation_id.py): HTTP middleware for X-Correlation-ID header extraction, validation, generation, and injection into responses
Enhanced logging (mcpgateway/services/logging_service.py): CorrelationIdJsonFormatter for automatic correlation ID injection into JSON logs with OpenTelemetry trace context

Structured Logging & Observability

New structured logger (mcpgateway/services/structured_logger.py): Central logging facade that persists to database (StructuredLogEntry) with enriched metadata (user, component, operation type, duration)
New log aggregator (mcpgateway/services/log_aggregator.py): Aggregates structured logs into PerformanceMetric windows with percentiles (p50/p95/p99) and error rates
New security logger (mcpgateway/services/security_logger.py): Specialized logger for authentication attempts, suspicious activity, and threat scoring
New audit trail service (mcpgateway/services/audit_trail_service.py): CRUD operation tracking with change sets, data classification, and review flags

API & Admin UI

New log search router (mcpgateway/routers/log_search.py): RESTful endpoints for log search, correlation tracing, security events, audit trails, and performance metrics
Enhanced Admin UI (mcpgateway/static/admin.js, mcpgateway/templates/admin.html): System Logs tab rebuilt with quick actions, correlation trace modal, unified timeline view, and dynamic filters

Database Schema

New Alembic migration (mcpgateway/alembic/versions/k5e6f7g8h9i0_add_structured_logging_tables.py): Creates 4 new tables:
- structured_log_entries: Comprehensive log storage with correlation IDs, user context, performance data, security indicators
- performance_metrics: Time-windowed aggregations with percentile calculations
- security_events: Threat analysis, failed attempt tracking, alert management
- audit_trails: CRUD tracking with change detection and compliance metadata

⚙️ Configuration

New Settings in config.py:

Correlation ID Settings (4 new fields):
- correlation_id_enabled: Enable/disable correlation ID tracking (default: True)
- correlation_id_header: Configurable header name (default: X-Correlation-ID)
- correlation_id_preserve: Preserve client-provided IDs (default: True)
- correlation_id_response_header: Echo correlation ID in responses (default: True)
Structured Logging Settings (3 new fields):
- structured_logging_enabled: Enable JSON logging with DB persistence (default: True)
- structured_logging_database_enabled: Persist logs to database (default: True)
- structured_logging_external_enabled: Send to external systems (default: False)
Performance Tracking Settings (6 new fields):
- performance_tracking_enabled: Enable performance metrics (default: True)
- performance_threshold_*_ms: Alert thresholds for database queries, tool invocations, resource reads, HTTP requests
- performance_degradation_multiplier: Alert threshold vs baseline (default: 1.5)
Security Logging Settings (4 new fields):
- security_logging_enabled: Enable security event logging (default: True)
- security_failed_auth_threshold: Failed attempts before high severity (default: 5)
- security_threat_score_alert: Threat score alert threshold (default: 0.7)
- security_rate_limit_window_minutes: Rate limit check window (default: 5)
Metrics Aggregation Settings (4 new fields):
- metrics_aggregation_enabled: Enable automatic log aggregation (default: True)
- metrics_aggregation_backfill_hours: Historical data to backfill on startup (default: 6)
- metrics_aggregation_window_minutes: Aggregation window size (default: 5)
- metrics_aggregation_auto_start: Auto-run aggregation loop (default: False)
Log Search Settings (2 new fields):
- log_search_max_results: Maximum results per query (default: 1000)
- log_retention_days: Days to retain logs in database (default: 30)

Updated .env.example:

Added 4 new active Correlation ID settings (CORRELATION_ID_ENABLED, CORRELATION_ID_HEADER, CORRELATION_ID_PRESERVE, CORRELATION_ID_RESPONSE_HEADER)
Added 17 new commented examples for Structured Logging, Performance Tracking, Security Logging, Metrics Aggregation, and Log Search settings
All 21 settings are fully documented in config.py with Pydantic Field definitions and defaults

🔌 Integration Points

Middleware Stack (main.py):

Registered CorrelationIDMiddleware after RequestLoggingMiddleware (execution order: RequestLogging → CorrelationID → Auth → Observability)
Added background tasks for metrics aggregation backfill + continuous loop when metrics_aggregation_auto_start=True
Included log_search router when structured_logging_enabled=True

Authentication & Security:

auth.py: Enhanced JWT validation with correlation ID context
middleware/auth_middleware.py: AuthContextMiddleware now logs successful/failed authentication attempts via SecurityLogger
middleware/http_auth_middleware.py: Unified correlation ID usage across plugin auth hooks

Service Layer:

services/tool_service.py: Integrated correlation ID fallback chain and structured logging for tool invocations
services/resource_service.py: Added user context and audit logging for resource operations
services/prompt_service.py: Enhanced with structured logging and performance tracking
services/server_service.py: Integrated audit trails for server lifecycle events
services/gateway_service.py: Added correlation ID propagation for federated requests
services/a2a_service.py: Added correlation ID and user context to agent invocations

Observability:

observability.py: Auto-inject correlation_id into OpenTelemetry spans as request.id attribute
middleware/request_logging_middleware.py: Gateway boundary logging (request_started/completed) with correlation IDs, user resolution, and duration tracking
admin.py: Plugin marketplace endpoints emit structured logs + audit trails for compliance

📁 New Files

mcpgateway/middleware/correlation_id.py – FastAPI middleware that extracts/preserves correlation IDs and injects them into responses
mcpgateway/utils/correlation_id.py – ContextVar utilities for generating, validating, and retrieving correlation IDs across async scopes
mcpgateway/services/structured_logger.py – Central structured logging facade that writes to JSON, DB, and optional external sinks
mcpgateway/services/log_aggregator.py – Aggregates StructuredLogEntry rows into PerformanceMetric windows and exposes helper APIs
mcpgateway/services/security_logger.py – Specialized logger for auth/suspicious events, computing threat scores and security audit entries
mcpgateway/services/audit_trail_service.py – Shared audit trail writer that records CRUD/data-access operations with change tracking
mcpgateway/routers/log_search.py – FastAPI router exposing /api/logs/search, /trace, /security-events, /audit-trails, /performance-metrics endpoints
mcpgateway/alembic/versions/k5e6f7g8h9i0_add_structured_logging_tables.py – Migration that creates structured_log_entries, performance_metrics, security_events, and audit_trails tables plus supporting indexes

Example Usage

curl -v http://localhost:4444/health

Full Response:

*   Trying 127.0.0.1:4444...
* Connected to localhost (127.0.0.1) port 4444 (#0)
> GET /health HTTP/1.1
> Host: localhost:4444
> User-Agent: curl/7.81.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< date: Thu, 27 Nov 2025 15:00:29 GMT
< server: uvicorn
< content-length: 20
< content-type: application/json
< x-content-type-options: nosniff
< x-frame-options: DENY
< x-xss-protection: 0
< x-download-options: noopen
< referrer-policy: strict-origin-when-cross-origin
< content-security-policy: default-src 'self'; script-src 'self' 'unsafe-inline' 'unsafe-eval' https://cdnjs.cloudflare.com https://cdn.tailwindcss.com https://cdn.jsdelivr.net https://unpkg.com; style-src 'self' 'unsafe-inline' https://cdnjs.cloudflare.com https://cdn.jsdelivr.net; img-src 'self' data: https:; font-src 'self' data: https://cdnjs.cloudflare.com; connect-src 'self' ws: wss: https:; frame-ancestors 'none';
< x-correlation-id: 6930e1f1a8b84beb904e18594bbf15dd
<
* Connection #0 to host localhost left intact
{"status":"healthy"}

Response header: x-correlation-id: 6930e1f1a8b84beb904e18594bbf15dd
Server logs: {"request_id": "6930e1f1a8b84beb904e18594bbf15dd", ...}

Correlation trace in Admin UI:

Navigate to Admin UI → System Logs tab
Click on correlation ID to Trace the correlation ID
Enter correlation ID or paste from search box
View unified timeline with all logs, security events, audit trails, and performance metrics for that request

Nov 14 '25 11:11 shoummu1

mcp-context-forge mcp-context-forge copied to clipboard

Correlation ID for Unified Request Tracking

📋 Summary

🔗 Related Issues

🔧 Changes Made

Core Implementation

⚙️ Configuration

🔌 Integration Points

📁 New Files

Example Usage

mcp-context-forge
mcp-context-forge copied to clipboard