vector
vector copied to clipboard
`vector` refuses to start when connectivity to one/any external service is not working
A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Problem
When running vector on RHEL9, the file /usr/lib/systemd/system/vector.service contains the line
ExecStartPre=/usr/bin/vector validate
This means vector will refuse to start-up if an external connection is not currently available, instead of starting up then retrying the connection, which is what it would do if the connection had gone down after it had successfully started.
From our config, I tried removing
healthchecks:
require_healthy: true
and removing this from every service
healthcheck:
enabled: true
but vector validate still fails causing the service to refuse to start.
I would suggest ExecStartPre=/usr/bin/vector validate in the vector.system file could have either --no-environment or --skip-healthchecks added so vector will start up & retry the external connection once started, which is what it would do if the connection had failed during normal operation.
Because we use vector to run other data migration services, in this case aggregating metrics, having them all fail because one (or more) is not working is not really a useful mode of operation.
Configuration
api:
enabled: true
address: 127.0.0.1:8686
expire_metrics_secs: 300
healthchecks:
require_healthy: true
sources:
vector_metrics:
type: internal_metrics
services_metrics:
type: prometheus_scrape
scrape_interval_secs: 15
scrape_timeout_secs: 2
endpoints:
- "http://127.0.0.1:9200/metrics"
- "http://127.0.0.1:9167/metrics"
dnstap:
type: dnstap
socket_path: /var/lib/vector/dnstap.sock
socket_file_mode: 0o777
mode: unix
multithreaded: true
relay_blocks:
type: vector
address: 10.17.252.114:9001
sinks:
output_my_prom:
type: prometheus_exporter
address: 172.17.252.114:9100
inputs:
- vector_metrics
- services_metrics
vector_dnstap:
inputs: [ dnstap ]
type: vector
address: "<hostname>:9000"
buffer:
max_size: 2684354880
type: "disk"
when_full: "drop_newest"
healthcheck:
enabled: true
tls:
enabled: true
ca_file: /etc/vector/pems/myCA.pem
key_file: /etc/vector/pems/vector.pem
crt_file: /etc/vector/pems/vector.pem
key_pass: "****"
verify_certificate: true
verify_hostname: true
vector_relay_blocks:
inputs: [ relay_blocks ]
type: vector
address: "<hostname>:9001"
buffer:
max_size: 2684354880
type: "disk"
when_full: "drop_newest"
healthcheck:
enabled: true
tls:
enabled: true
ca_file: /etc/vector/pems/myCA.pem
key_file: /etc/vector/pems/vector.pem
crt_file: /etc/vector/pems/vector.pem
key_pass: "****"
verify_certificate: true
verify_hostname: true
Version
vector 0.37.0 (x86_64-unknown-linux-gnu c1da408 2024-03-26 13:41:34.870460047)
Debug Output
# vector validate
√ Loaded ["/etc/vector/vector.yaml"]
√ Component configuration
2024-04-26T11:02:06.910152Z ERROR vector::topology::builder: msg="Healthcheck failed." error=Request failed: status: Unavailable, message: "error trying to connect: error:0A000086:SSL routines:(unknown function):certificate verify failed:ssl/statem/statem_clnt.c:2092:: unable to get local issuer certificate", details: [], metadata: MetadataMap { headers: {} } component_kind="sink" component_type="vector" component_id=vector_relay_blocks
x Health check for "vector_relay_blocks" failed: Request failed: status: Unavailable, message: "error trying to connect: error:0A000086:SSL routines:(unknown function):certificate verify failed:ssl/statem/statem_clnt.c:2092:: unable to get local issuer certificate", details: [], metadata: MetadataMap { headers: {} }
2024-04-26T11:02:07.010816Z ERROR vector::topology::builder: msg="Healthcheck failed." error=Request failed: status: Unavailable, message: "error trying to connect: error:0A000086:SSL routines:(unknown function):certificate verify failed:ssl/statem/statem_clnt.c:2092:: unable to get local issuer certificate", details: [], metadata: MetadataMap { headers: {} } component_kind="sink" component_type="vector" component_id=vector_dnstap
x Health check for "vector_dnstap" failed: Request failed: status: Unavailable, message: "error trying to connect: error:0A000086:SSL routines:(unknown function):certificate verify failed:ssl/statem/statem_clnt.c:2092:: unable to get local issuer certificate", details: [], metadata: MetadataMap { headers: {} }
√ Health check "output_my_prom"
Example Data
No response
Additional Context
No response
References
No response
You talk about vector validate, but with amqp sink I observe that neither validate nor vector itself can start when sink does not respond correctly on the port. I can see that vector validate behaves slightly differently with or without --skip-healthchecks in situation when dummy port is opened, but in both cases it fails with exit code 78.
$ . test-vector.sh
+ vector validate
√ Loaded ["/etc/vector/vector.yaml"]
Component errors
----------------
x Sink "rabbitmq": creating amqp producer failed: IO error: Connection refused (os error 111)
+ echo exited 78
exited 78
+ vector validate --skip-healthchecks
√ Loaded ["/etc/vector/vector.yaml"]
Component errors
----------------
x Sink "rabbitmq": creating amqp producer failed: IO error: Connection refused (os error 111)
+ echo exited 78
exited 78
+ vector validate
+ nc -lk -p 5673 127.0.0.1
√ Loaded ["/etc/vector/vector.yaml"]
2024-06-05T10:51:38.322879Z ERROR lapin::io_loop: Socket was readable but we read 0. This usually means that the connection is half closed this mark it as broken
2024-06-05T10:51:38.322964Z ERROR lapin::io_loop: error doing IO error=IOError(Kind(ConnectionAborted))
2024-06-05T10:51:38.323042Z ERROR lapin::channels: Connection error error=IO error: connection aborted
AMQP
Component errors
----------------
x Sink "rabbitmq": creating amqp producer failed: IO error: connection aborted
+ echo exited 78
exited 78
+ vector validate --skip-healthchecks
√ Loaded ["/etc/vector/vector.yaml"]
Component errors
----------------
x Sink "rabbitmq": creating amqp producer failed: IO error: Connection refused (os error 111)
+ echo exited 78
exited 78