Meridio icon indicating copy to clipboard operation
Meridio copied to clipboard

Collect Metrics

Open LionelJouin opened this issue 1 year ago • 2 comments

Is your feature request related to a problem? Please describe.

Add metrics to Meridio to improve observability tools. Here is some slides: https://docs.google.com/presentation/d/1yuiDj7H4NZTea7dJAKPK4SBvkHtZyWrn5HisNe1shuI And OpenTelemetry/Prometheus/Grafana stack deployment instruction: https://gist.github.com/LionelJouin/cfa15a569f1f23d8a84d43dc73b5f373

Describe the solution you'd like

Interface metrics in stateless-lb-frontend / Proxy / TAPA

<interface.metric>: rx_packets, tx_packets, rx_bytes, tx_bytes, rx_errors, tx_errors, rx_dropped, tx_dropped

  • Name: meridio.interface.<interface.metric>
  • Description: <interface.metric> metrics for the network interface
  • Type: Counter
  • Value: <interface.metric>
  • Attributes:
    • Pod Name
    • Trench
    • Conduit (optional)
    • Attractor (optional)
    • Interface Name

Interface status in stateless-lb-frontend / Proxy / TAPA ?

  • Name: meridio.interface.status
  • Description: Network interface status
  • Type: Gauge (Health Metric)
  • Value: Status of the interface (1 if up, 0 if down)
  • Attributes:
    • Pod Name
    • Trench
    • Conduit (optional)
    • Attractor (optional)
    • Interface Name

Stream status in conduit instance

  • Name: meridio.conduit.stream.status
  • Description: Stream status in the conduit instance
  • Type: Gauge (Health Metric)
  • Value: Status of the stream (1 if configured)
  • Attributes:
    • Pod Name
    • Trench
    • Conduit
    • Stream

Flows configured in conduit instance

  • Name: meridio.conduit.stream.flow.status
  • Description: Flow status in the conduit instance
  • Type: Gauge (Health Metric)
  • Value: Status of the flow (1 if configured) (Counter with nfqlb matches_count instead?)
  • Attributes:
    • Pod Name
    • Trench
    • Conduit
    • Stream
    • Flow

Targets configured in conduit instance

  • Name: meridio.conduit.stream.target.status
  • Description: Target status in the conduit instance
  • Type: Gauge (Health Metric)
  • Value: Status of the target (1 if configured, 0 if pending) (Counter with nftables prerouting on fwmark instead?)
  • Attributes:
    • Pod Name
    • Trench
    • Conduit
    • Stream
    • Target (identifier + IPs)
nft add table inet meridio-metrics
nft add chain inet meridio-metrics target-hits { type filter hook postrouting priority 0 \; }
nft add rule inet meridio-metrics target-hits meta mark 0x13dc counter

Targets configured in conduit instance

  • Name: meridio.conduit.target.connectivity.status
  • Description: Target status in the conduit instance
  • Type: Gauge (UpDownCounter instead?)
  • Value: ping in ms (-1 if no reply)
  • Attributes:
    • Pod Name
    • Trench
    • Conduit
    • IP

Gateways configured in attractor instance

  • Name: meridio.attractor.gateway.status
  • Description: Gateway status in the attractor instance
  • Type: Gauge (Health Metric)
  • Value: Status of the target (1 if running, 0 if failing)
  • Attributes:
    • Pod Name
    • Trench
    • Attractor
    • Gateway

Describe alternatives you've considered /

Additional context https://opentelemetry.io/docs/specs/otel/metrics/semantic_conventions/

LionelJouin avatar May 23 '23 13:05 LionelJouin