pixie icon indicating copy to clipboard operation
pixie copied to clipboard

Proposal: Add AI-driven Network Anomaly Detection Plugin and OpenTelemetry Export for Pixie

Open vatankh opened this issue 2 months ago • 0 comments

Background

Pixie provides deep, eBPF-based visibility into Kubernetes clusters, automatically capturing network and application telemetry without manual instrumentation. However, while Pixie offers powerful query and visualization capabilities (via PxL and Vizier), it currently lacks a built-in mechanism for automated anomaly detection or OpenTelemetry-native export of detected network irregularities.

This limits the ability of operators to detect and correlate real-time operational anomalies (such as unexpected service-to-service communication, latency spikes, or throughput drops) directly within Pixie’s observability workflow or external telemetry pipelines.

Problem Statement

Existing open-source tools like Zeek or Suricata perform deep packet inspection but are not optimized for the dynamic, container-based nature of cloud-native microservices. Pixie already solves visibility at scale but does not yet provide AI-assisted detection or direct integration with the OpenTelemetry ecosystem.

Proposed Solution

Introduce a lightweight, optional plugin for Pixie that performs operational anomaly detection on network traffic metrics and exports the results through OpenTelemetry.

  1. AI-driven Anomaly Detection Layer

    • Implement a Pixie plugin or PxL script extension that computes simple streaming anomaly scores on traffic metrics (latency, request rate, error rate, byte count).
    • Techniques: EWMA, robust z-scores, Isolation Forest, or simple autoencoders (depending on available library support and compute limits).
    • Tag anomalies with metadata such as service_a, service_b, namespace, and anomaly.score.
  2. OpenTelemetry Export Integration

    • Extend Pixie’s existing OpenTelemetry export capabilities to include these anomaly events as metrics or logs.
    • Allow configuration of anomaly thresholds and export frequency via Pixie’s plugin interface.
  3. Example Output

    - name: px.anomaly.network.latency_spike
      attributes:
        src_service: checkout
        dst_service: payment
        namespace: production
        anomaly.score: 0.94
      timestamp: 2025-11-11T12:00:00
    
    

Benefits

  • Enables real-time operational anomaly detection without additional instrumentation.

  • Bridges Pixie’s in-cluster visibility with the broader OpenTelemetry and AIOps ecosystem.

  • Provides actionable alerts and insights directly in the Pixie UI and external dashboards (Grafana, Datadog, etc.).

Scope & Alignment

  • Keeps focus on observability and performance analysis, not security or intrusion detection.

  • Aligns with the goal of improving AI-driven insights in Pixie’s roadmap.

  • Can be developed as an independent plugin, avoiding changes to Pixie’s core.

vatankh avatar Nov 11 '25 10:11 vatankh