Proposal: Add AI-driven Network Anomaly Detection Plugin and OpenTelemetry Export for Pixie
Background
Pixie provides deep, eBPF-based visibility into Kubernetes clusters, automatically capturing network and application telemetry without manual instrumentation. However, while Pixie offers powerful query and visualization capabilities (via PxL and Vizier), it currently lacks a built-in mechanism for automated anomaly detection or OpenTelemetry-native export of detected network irregularities.
This limits the ability of operators to detect and correlate real-time operational anomalies (such as unexpected service-to-service communication, latency spikes, or throughput drops) directly within Pixie’s observability workflow or external telemetry pipelines.
Problem Statement
Existing open-source tools like Zeek or Suricata perform deep packet inspection but are not optimized for the dynamic, container-based nature of cloud-native microservices. Pixie already solves visibility at scale but does not yet provide AI-assisted detection or direct integration with the OpenTelemetry ecosystem.
Proposed Solution
Introduce a lightweight, optional plugin for Pixie that performs operational anomaly detection on network traffic metrics and exports the results through OpenTelemetry.
-
AI-driven Anomaly Detection Layer
- Implement a Pixie plugin or PxL script extension that computes simple streaming anomaly scores on traffic metrics (latency, request rate, error rate, byte count).
- Techniques: EWMA, robust z-scores, Isolation Forest, or simple autoencoders (depending on available library support and compute limits).
- Tag anomalies with metadata such as
service_a,service_b,namespace, andanomaly.score.
-
OpenTelemetry Export Integration
- Extend Pixie’s existing OpenTelemetry export capabilities to include these anomaly events
as
metricsorlogs. - Allow configuration of anomaly thresholds and export frequency via Pixie’s plugin interface.
- Extend Pixie’s existing OpenTelemetry export capabilities to include these anomaly events
as
-
Example Output
- name: px.anomaly.network.latency_spike attributes: src_service: checkout dst_service: payment namespace: production anomaly.score: 0.94 timestamp: 2025-11-11T12:00:00
Benefits
-
Enables real-time operational anomaly detection without additional instrumentation.
-
Bridges Pixie’s in-cluster visibility with the broader OpenTelemetry and AIOps ecosystem.
-
Provides actionable alerts and insights directly in the Pixie UI and external dashboards (Grafana, Datadog, etc.).
Scope & Alignment
-
Keeps focus on observability and performance analysis, not security or intrusion detection.
-
Aligns with the goal of improving AI-driven insights in Pixie’s roadmap.
-
Can be developed as an independent plugin, avoiding changes to Pixie’s core.