Add batching support for high-volume runtime monitoring
Summary
Adds configurable batching support to the remote sink to handle high-volume runtime monitoring scenarios efficiently.
Problem
Runtime monitoring of syscalls was too slow when sending each event individually to the monitoring system. This created performance bottlenecks when monitoring high-frequency syscalls in production environments.
Solution
Added batching functionality with a configurable batch_interval parameter:
- New
batch_intervalconfig parameter allows collecting syscalls over a specified time period (e.g., "5500ms") - Messages are collected in memory and flushed together at the interval boundary
- Maintains SOCK_SEQPACKET message boundaries by sending each batched message individually during flush
- Reduces syscall overhead while preserving message segmentation required by monitoring receivers
Configuration Example
{
"name": "remote",
"config": {
"endpoint": "/tmp/gvisor_events.sock",
"retries": 3,
"batch_interval": "5500ms"
}
}
Impact
This significantly improves performance for runtime monitoring systems that need to process high volumes of syscall events by:
- Reducing the number of flush operations from one-per-event to one-per-interval
- Maintaining compatibility with monitoring tools that require discrete messages
- Allowing tunable performance via the batch_interval parameter
Tested with syscall monitoring including write syscalls and other high-frequency events.
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).
View this failed invocation of the CLA check for more information.
For the most up to date status, view the checks section at the bottom of the pull request.
cc @fvoznika
@vivaan-gupta-databricks could you also sign the CLA.