vector
vector copied to clipboard
Metrics Stop Prematurely at Program Shutdown
A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Problem
When I run Vector from the command line with a single data source of demo data for a few seconds, then terminate the process with ctrl-C, the values for Metrics stop being updated prematurely while events are still written.
This is my first ever Issue with Vector, so if I missed something really obvious or more troubleshooting needs to be done on my end please do let me know! :-)
Configuration
[sources.in]
type = "internal_metrics"
scrape_interval_secs = 1
[sources.in_demo]
type = "demo_logs"
format = "apache_common"
interval = 1
#interval = 5
#interval = 10
[transforms.tf]
inputs = [ "in" ]
type = "metric_to_log"
[sinks.json]
inputs = ["tf"]
type = "file"
path = "output-metrics.json"
encoding.codec = "json"
[sinks.out_demo]
inputs = ["in_demo"]
type = "file"
path = "output-demo.json"
encoding.codec = "json"
Version
vector 0.23.3 (x86_64-unknown-linux-musl af8c9e1 2022-08-10)
Debug Output
https://gist.github.com/dmuth/736ce3e8df596de6eff41376a0896556
Example Data
N/A
Additional Context
From a sample run (vector -c ./internal-metrics.toml
), with interval = 1
:
$ wc -l output-demo.json
8 output-demo.json
$ cat output-metrics.json |jq -C '. | select(.name == "events_in_total" and .tags.component_name == "in_demo")
[...]
{
"counter": {
"value": 7
},
"kind": "absolute",
"name": "events_in_total",
"namespace": "vector",
"tags": {
"component_id": "in_demo",
"component_kind": "source",
"component_name": "in_demo",
"component_type": "demo_logs"
},
"timestamp": "2023-01-11T20:52:38.202574632Z"
}
Expected value for the metrics count would be 8, matching the number of events in the output file.
When I set interval = 5
and run Vector for 6 seconds, here's what I see:
$ wc -l output-demo.json
3 output-demo.json
$ cat output-metrics.json |jq -C '. | select(.name == "events_in_total" and .tags.component_name == "in_demo")
[...]
{
"counter": {
"value": 2
},
"kind": "absolute",
"name": "events_in_total",
"namespace": "vector",
"tags": {
"component_id": "in_demo",
"component_kind": "source",
"component_name": "in_demo",
"component_type": "demo_logs"
},
"timestamp": "2023-01-11T20:54:29.671992619Z"
}
Watching the execution in real-time provides a clue: when I hit ctrl-C about 6 seconds in, Vector runs for another 4 seconds or so, and the third event is written to the output file. This suggests that when ctrl-C is received, metrics immediately stop being generated, buffers are flushed/pending events are processed, and then the Vector process exits.
References
None that I could find after doing a few searches here.
Thank you for your time, and all the effort that's gone into Vector so far!
-- Doug
Hey @dmuth - looks legit! Thanks for the thorough report. I suspect what's happening (basically what you suggested) is all of the components attempt to finish and shutdown at the same time, and the internal_metrics
source happens to do it faster than the demo_logs
source.
I can envision a tiered shutdown where the "actual event streams" are finished, then the "internal event streams", and then finally cleanly shutdown - to not lose any of the telemetry. It could get complicated identifying/separating components into tiers though.
@spencergilbert as an ad-hoc solution instead of implementing "true" tiering, I could suggest just dropping "internal_metrics" source as the last one. Not a pretty solution but should work :) Is it worth it? I don't know, honestly.
This likely affects the internal_logs
source as well as both this and internal_metrics
will shut down right away when receiving the signal.
Closed by https://github.com/vectordotdev/vector/pull/17741