vector icon indicating copy to clipboard operation
vector copied to clipboard

TLS certificates not reloaded on config reload

Open abh opened this issue 3 years ago • 15 comments

Vector is using an old certificate (long since updated on disk). The TLS renewal process touches the vector config to have it reloaded.

2022-01-17T07:02:29.400274Z ERROR sink{component_kind="sink" component_id=queries component_type=http component_name=queries}:request{request_id=881481}: vector::sinks::util::sink: Response failed. response=Response { status: 495, version: HTTP/1.1, headers: {"content-type": "text/html", "cache-control": "no-cache", "content-length": "107"}, body: b"<html><body><h1>495 SSL Certificate Error</h1>\r\nAn invalid certificate has been provided.\r\n</body></html>\r\n" }
2022-01-17T07:02:30.342604Z  INFO vector::config::watcher: Configuration file changed.
2022-01-17T07:02:30.343668Z  INFO vector::topology::running: Running healthchecks.
2022-01-17T07:02:30.343735Z  INFO vector: Vector has reloaded. path=[File("/etc/vector/vector.toml", Some(Toml))]

However the TLS certs aren't reloaded:

2022-01-17T07:02:30.392418Z ERROR sink{component_kind="sink" component_id=queries component_type=http component_name=queries}:request{request_id=881482}: vector::sinks::util::retries: Not retriable; dropping the request. reason="response status: 495 <unknown status code>"
2022-01-17T07:02:30.392471Z ERROR sink{component_kind="sink" component_id=queries component_type=http component_name=queries}:request{request_id=881482}: vector::sinks::util::sink: Response failed. response=Response { status: 495, version: HTTP/1.1, headers: {"content-type": "text/html", "cache-control": "no-cache", "content-length": "107"}, body: b"<html><body><h1>495 SSL Certificate Error</h1>\r\nAn invalid certificate has been provided.\r\n</body></html>\r\n" }

Restarting vector instead of touching / reloading the configuration works, but it's a bit crude (our certs are only valid for a few days at a time).

abh avatar Jan 17 '22 07:01 abh

I opine that --watch-config should monitor for updates to certificate files as well as part of this.

jszwedko avatar Dec 29 '22 22:12 jszwedko

I've also ran into this problem with short-lived certificates, kill -HUP $PID doesn't seem to be enough for Vector to pickup the new certificates and instead needs a full restart

w4 avatar Apr 24 '23 10:04 w4

Hi, to mitigate this problem, I have installed Reloader and configured to restart vector if TLS-secrets are updated.

minimax75 avatar May 11 '23 08:05 minimax75

We are also hitting this problem with short-lived certificates. Would be nice if it can be implemented in one of the next versions

schmitz-chris avatar May 31 '23 12:05 schmitz-chris

Hi, to mitigate this problem, I have installed Reloader and configured to restart vector if TLS-secrets are updated.

In my system vault-agent is generating the certs and I configured it to run a script that reloads vector after updating the certs. Still would be nicer if Vector considered them part of the config when --watch-config is enabled. :-)

abh avatar Jun 01 '23 05:06 abh

Hi, any update on this?

schmitz-chris avatar Jul 26 '23 11:07 schmitz-chris

Hi, any update on this?

Not yet unfortunately. The workaround is to restart rather than reload Vector.

jszwedko avatar Jul 26 '23 13:07 jszwedko

Any update on this? This is really valuable with short-lived certificates

cah-michel-bitar avatar Feb 06 '25 11:02 cah-michel-bitar

There was some activity in: https://github.com/vectordotdev/vector/issues/17283

pront avatar Feb 06 '25 20:02 pront

See discussion here for remaining items before we can consider this issue resolved: https://github.com/vectordotdev/vector/pull/22539#discussion_r1991590321

pront avatar Mar 17 '25 18:03 pront

Still having the same issue with 0.47.0 Certificates are renewed every two days by vault. After refreshing the certificates SIGHUP is sent to vector (vector is also started with watch-config parameter). However when writing to loki (via nginx for mTLS) the old client certificate is used. Thats the message in from nginx: [info] 7#7: *53294 client SSL certificate verify error: (10:certificate has expired) while reading client request headers

I have manually sent SIGHUP to vector to ensure that it has not been a mis-configuration in the nomad-job. I have additionally checked the client-certificate to ensure that it is valid (yes until may 29th) and that it has been renewed shortly before problems have started (yes that has been the case). Still nginx has been complaining that the client certificate has expired.

I have then restarted vector which resolved the problem

dev756 avatar May 27 '25 08:05 dev756

We're also still seeing issues with 0.47.0.

We run vector on Kubernetes and I believe the main issue there is, that when mounting the tls key into a container through a secret, k8s will actually mount the key under <mount_path>/..<timestamp>/tls.key and will create a symlink from <mount_path>/tls.key to it. And as far as I can tell, the current implementation doesn't handle symlinks.

This results in logs that look like this:

2025-06-04T11:47:36.554683Z  INFO vector::app: Log level is enabled. level="info"
2025-06-04T11:47:36.555553Z  INFO vector::app: Loading configs. paths=["/etc/vector"]
2025-06-04T11:47:36.556920Z  INFO vector::app: Starting watcher. paths=["/etc/vector"]
2025-06-04T11:47:36.556942Z  INFO vector::app: Components to watch. paths=[ComponentConfig { config_paths: ["/var/lib/tls/..2025_06_04_11_47_21.489208874/client.crt", "/var/lib/tls/..2025_06_04_11_47_21.489208874/client.key"], component_key: ComponentKey { id: "http_push" } }]
2025-06-04T11:47:36.556952Z  INFO vector::config::watcher: Creating configuration file watcher.
2025-06-04T11:47:36.557614Z  INFO vector::config::watcher: Watching configuration files.
2025-06-04T11:47:36.611824Z  INFO vector::topology::running: Running healthchecks.
2025-06-04T11:47:36.611883Z  INFO vector::topology::builder: Healthcheck passed.
2025-06-04T11:47:36.611891Z  INFO vector::topology::builder: Healthcheck passed.
2025-06-04T11:47:36.611908Z  INFO vector::topology::builder: Healthcheck passed.
2025-06-04T11:47:36.611915Z  INFO vector: Vector has started. debug="false" version="0.47.0" arch="x86_64" revision="3d5af22 2025-05-20 13:53:41.638057046"
...
2025-06-04T11:47:38.447739Z ERROR vector::config::watcher: Failed to read files to watch. error=No path was found.
...
2025-06-04T11:47:48.447934Z  INFO vector::config::watcher: Creating configuration file watcher.
2025-06-04T11:47:48.449251Z ERROR vector::config::watcher: Failed to create file watcher. error=No path was found.
...
2025-06-04T11:47:58.449327Z  INFO vector::config::watcher: Creating configuration file watcher.
2025-06-04T11:47:58.450728Z ERROR vector::config::watcher: Failed to create file watcher. error=No path was found.
...

Notice that we're actually watching /var/lib/tls/..2025_06_04_11_47_21.489208874/client.crt. When Kubernetes updates the certificate it will mount it to a new directory, update the symlink, and delete the old directory. The config watcher, however, will still watch the old path and will then start failing because that file doesn't exit anymore.

On a similar note, I'm fairly sure the current implementation also doesn't handle changing the TLS certificate path in the vector config, as the config_paths to watch are extracted once and never updated.

glrf avatar Jun 04 '25 12:06 glrf

Thanks @dev756 and @glrf for the feedback. Symlink support and watched path updates will need to be implemented.

Also, slightly related enhancement request: https://github.com/vectordotdev/vector/issues/23082

pront avatar Jun 10 '25 14:06 pront

Hi, I'm seeing the same issue when TLS certificates are configured on any of the sources (Vector 0.47.0-1 on Ubuntu). This may have been fixed for the sinks by !22539, but it's still broken for sources.

zajdee avatar Oct 24 '25 12:10 zajdee