vector
vector copied to clipboard
New `rolling` transform
It would be nice to have a native transform which can perform rolling window calculations on events fields.
Use cases
One simple and common potential use case is combining it with fixed source (https://github.com/timberio/vector/issues/1317) to generate a sequence of events with increasing counter.
A more complex use case is doing simple smoothing or counting in Vector before sending the data to sinks, for example, computing moving averages with subsequent sampling.
Configuration
It would take a field and then calculate one of the following functions on the values of this field:
count- count number of events in which the source field was present;sum- sum values of the source field;average- calculate average of the values of the source field;rms- calculate residual mean square (sqrt(average(x^2) - average(x)^2)).
The rolling window could take one of the following types
expanding- the window contains all timeseries points from the beginning;exponential- the window weights timeseries points with exponential decay;rectangular- the window uses either fixed number of points or fixed time interval.
The size of the window could be specified using one of the following configuration options:
window_points- specifies size of the window as number of events;window_duration- specifies size of the window as time interval.
It should be possible to specify additional parameter window_field, which would:
- for
window_pointsuse number of times the fieldwindow_fieldoccured instead of total number of messages; - for
window_durationuse the fieldwindow_fieldas timestamp using which the duration is determined.
Additionally, there should be a Boolean parameter persistent, which would specify should the state of the window function stored on disk or not.
Example (common)
[transforms.rolling]
type = "rolling"
function = "count"
target_field = "count"
Example (advanced)
[transforms.rolling]
type = "rolling"
function = "average" # other values are `sum`, `average`, `rms`
window_type = "expanding" # other values are `exponential` and `rectangular`
window_time = 3
persistent = true # false by default
field = "message" # source field, optional for `count` function
target_field = "message_count" # optional if `field` is specified
drop_field = false # optional
References
Inspired by the rolling window API in Pandas.
Nice, I really like this idea, would have lots of uses.
Agree, this would be very useful!
I think the primary challenge is that the Transform API currently only lets you produce an output event in response to an input event. This makes it hard to do things like produce outputs on a regular interval, since you never know how long it will be before your next input event.
When I last thought about this, the best idea I had to was introduce a new type of transform internally that had an API more similiar to that of sources (i.e. it gets passed an output channel). This would require a bit of work in topology building, but would give lots of flexibility.
Blocked by #1598.
@binarylogic Hi Ben! I found the issue #1598 has been closed, is there any plan to implement this transform function? I post an issue #9231 which could be solved by this proposal. Many thanks!
Would love to see this as well.
Telegraf had some support but it has a simple sequential pipeline (can't define how calculation flows or combines), it would be awesome if Vector could support such aggregations.
Looks like a duplicate of https://github.com/vectordotdev/vector/issues/3668
👍 I think #3668 is slightly different in it is about aggregating metric events. This issue is about generating metrics from logs.
@jszwedko Yeah but we do really need them both. Aggregate should work on logs as well as metrics. They are JSONs afterall
Additional point of comparison: https://docs.fluentbit.io/manual/stream-processing/getting-started/fluent-bit-sql