Sample configurations for "netprobe" tap and policy
The purpose of this ticket is to discuss how to structure the new netprobe tap and handler. Here are a few concepts to keep in mind in reviewing this:
- the
netprobetap provides the basic facilities to run network tests, but should only include host specific configuration settings and default configuration overrides - the
netprobeinput is where the network tests are defined and configured, as well as any default configuration overrides - the
netprobehandler is where the metrics that are to be measured and collected are configured
Here is a sample configuration for the netprobe tap:
version: "1.0"
visor:
taps:
default_netprobe:
input_type: netprobe
config:
maximum_concurrent_tests: 10
ip_source_binding: 127.0.0.1
tags:
virtual: true
vhost: 1
Here is a sample policy for the netprobe input and handler:
version: "1.0"
visor:
policies:
basic_ping_policy:
kind: collection
description: "basic PING netprobe policy"
input:
tap: default_netprobe
input_type: netprobe
config:
test_type: ping
interval_msec: 2000
timeout_msec: 1000
packets_per_test: 10
packets_interval_msec: 25
packet_payload_size: 56
disable_scout_packet: false
disable_integrity_check: false
targets:
test_1_name:
target: foo.bar
test_2_name:
target: 10.0.0.1
tos: EF
handlers:
config:
num_periods: 2 #default is 5
modules:
default_ping:
type: netprobe
config:
latency_units: msec
metric_groups:
enable:
- quantiles
- dns_resolution
disable:
- jitter
This looks great, and fits within our current architecture and data model. I think it means the netprobe input would expose events per probe type such as ping_received, ping_failed, etc which the netprobe handler can attach to and calculate metrics from.
On ping in particular, wikipedia says The payload may include a timestamp indicating the time of transmission and a sequence number, which are not found in this example. This allows ping to compute the round trip time in a stateless manner without needing to record the time of transmission of each packet. which even simplifies our tracking.
I may have some minor suggestions to tweaks on actual config variable names but the spirit here looks good.
One important difference from our discussion (that I failed to highlight) is that the test_type is not defined at the tap level in this sample, but rather at the input level. I think it's more intuitive for a user to define the test type when configuring the actual tests than to have them link the tests to the tap of the appropriate type. For discussion.