cloudprober Add a single run interface

This is similar to https://github.com/cloudprober/cloudprober/issues/23, or at least requires the same sort of primitives.

Cloudprober is designed to run in a continuous mode, but it will be nice to be to able to run it just once (or given number of times, like -c flag of ping) and generate a report, or pass/fail signal.

Jan 21 '25 17:01 manugarg

For certain probe types it will be more difficult to support single run interface, e.g. PING, UDP, but for others, e.g. HTTP, DNS, GRPC, TCP, EXTERNAL, it should be doable. This issue tracks the implementation of such an interface for such probe types.

Jan 21 '25 17:01 manugarg

Gave more thought to single run interface at the binary level and below:

Interface

Binary Flags: —single_mode, —count
Current flow is cmd/cloudprober → cloudprober.Start → prober.Start → probe.Start , on singleMode it will be cmd/cloudprober → cloudprober.Run(count) → prober.Run(count) → probe.Run(count)
Probe will need to support Run interface. We can add another interface called ProbeWithRun.
In single run mode, we’ll not write metrics to the shared data channel, instead we’ll just return EventMetrics. We’ll summarize EventMetrics after run and return output in JSON as well as human-readable text format.

Comments

Run interface will be hard for ping, and UDP style probes, but that’s okay.. we don’t have to build them all at once.

Jan 23 '25 23:01 manugarg

Probe's run interface will look like this:

Probe {
	Init(name string, opts *options.Options) error
	Start(ctx context.Context, dataChan chan *metrics.EventMetrics)
        Run(ctx context.Context) (bool, []byte, error)  // successOrFail, json-formatted metrics, error
}

Older interface will look like this:

ProbeWithoutRun {
	Init(name string, opts *options.Options) error
	Start(ctx context.Context, dataChan chan *metrics.EventMetrics)
}

Jan 31 '25 01:01 manugarg

I think final result format is going to be interesting here. Do we return metrics in the end, or just success/fail and perhaps overall duration?

Or maybe both? We can do the following:

Mark exit status successful if all probes succeeded.
Allow caller to parse the metrics output to figure out more details, e.g. we can return metrics in json format:
```
[{
    name: probe1
    dst: target-1
    success: 
    total:
    latency:
 },
 {}
]
```

Feb 23 '25 00:02 manugarg

One of the usage of this functionality could be to run cloudprober to verify a deployment. It may be useful to extend this functionality with things like:

Wait for minimum 3 consecutive successes, and wait for up-to 5 min.
Override probe interval and timeout on command line

Mar 21 '25 17:03 manugarg

Added first draft of single run interface in #1081.

Aug 05 '25 23:08 manugarg