vector icon indicating copy to clipboard operation
vector copied to clipboard

Adjust healthcheck for `prometheus_remote_write`

Open jszwedko opened this issue 4 years ago • 10 comments

Broken off of https://github.com/timberio/vector/pull/8269#pullrequestreview-706306727

The prometheus_remote_write sink currently makes a GET request to the endpoint for the health check. This fails for Prometheus itself as its healthcheck URL is /-/healthy. The remote write protocol does not describe a health check mechanism so it is unlikely that there is an HTTP request that would work for all applications that support the remote write protocol. Instead, I think we may want to just check for connectivity.

cc/ @bruceg in case you know something I don't.

jszwedko avatar Jul 14 '21 14:07 jszwedko

I don't think we currently do this anywhere, but we could expose more configuration for healthcheck and provide a default (/-/healthy) and allow the user to change it if whatever target they're using has a different endpoint?

We'd want to check that other targets do have different endpoints, but I could see this being useful in other sinks as well (loki with Grafana Cloud not exposing the healthcheck endpoint).

spencergilbert avatar Jul 14 '21 14:07 spencergilbert

@jszwedko that matches my knowledge as well. The remote write protocol only has one action, and that is to write metrics. We could possibly submit a request with zero events in it, but I have no idea if that wouldn't cause an error due to being empty. We would have to play with that to determine if it is viable.

bruceg avatar Jul 20 '21 00:07 bruceg

I noted in Slack as well, but there will be differences in the endpoints per remote_write target - we may want to create a healthcheck.endpoint, I imagine that could be useful for other sinks as well

spencergilbert avatar Jul 20 '21 01:07 spencergilbert

@spencergilbert @jszwedko @bruceg from your point of view, is it ok to add a custom healthcheck.endpoint support to the prometheus_remote_write? I think I can try to implement it. It defintely would be useful for the users.

zamazan4ik avatar Sep 11 '22 18:09 zamazan4ik

Talking with @bruceg I think we'd support a PR for that, it should also close https://github.com/vectordotdev/vector/issues/13890. I think healtcheck.endpoint is a suitable name for the option, and it could either default to None to re-use the endpoint configuration - or have a default value of that's a reasonable default.

spencergilbert avatar Sep 12 '22 16:09 spencergilbert

Another followup, talking with @jszwedko - perhaps healthcheck.path could be better if we're expecting the base of the address to stay the same and just vary the path, as it would save the user repeating most of the endpoint option.

Would be curious what your opinion is as a user @zamazan4ik.

spencergilbert avatar Sep 12 '22 18:09 spencergilbert

Another followup, talking with @jszwedko - perhaps healthcheck.path could be better if we're expecting the base of the address to stay the same and just vary the path, as it would save the user repeating most of the endpoint option.

That is an interesting question. From the one hand, specifying only a path will help with a path duplication and would prevent possible configuration errors like "changed a remote uri but forgot to change a corresponding health check uri". From another hand, in real life possible some really tricky configuration. E.g. the remote uri and the corresponding healthcheck uri can be located on the different servers behind different reverse proxies. And in this case we will not be able to configure it, if only a path part can be changed for the healthcheck uri. Or even a user want to define their own dedicated health server, which implements some logic on a health calculation.

Since I did not hear before about so custom setups with different healthcheck uri, I guess we can start with a healthcheck.path way. Even if it is less flexible, it will reduce a chance of possible missconfigurations. Later, if will be the requests from users about adding more flexibility, we can think about it and add an additional option or refactor somehow an existing one.

zamazan4ik avatar Sep 13 '22 00:09 zamazan4ik

This issue is still persisting, is there any alternatives, or hacky Fixes?

Prajwalprakash3722 avatar Apr 18 '24 09:04 Prajwalprakash3722

This issue is still persisting, is there any alternatives, or hacky Fixes?

AFAIK, no fixes yet in this field.

zamazan4ik avatar Apr 20 '24 12:04 zamazan4ik

This feature would make Vector one of the best tools for pushing metrics for edge/iot and other uses with limited drive space. Prometheus agent is kinda not function properly in that scenario

akuzia avatar Oct 06 '24 18:10 akuzia