thanos
thanos copied to clipboard
.*: Apply webrouteprefix on debug and metric endpoints
- [x] I added CHANGELOG entry for this change.
- [ ] Change is not relevant to the end user.
Fixes https://github.com/thanos-io/thanos/issues/2727
Changes
This is a continuation of the work started by @ranjithkumar007 in https://github.com/thanos-io/thanos/pull/2524.
I've completed the rebase, fixed prefix handling in Thanos Store, added prefix handling logic to Thanos Compact (the flags were there but ignored, previously), and added additional testing around prefix behaviors in e2e tests.
Related issues:
- https://github.com/thanos-io/thanos/issues/2486
- https://github.com/thanos-io/thanos/issues/1700
- Note: @bwplotka advocated for keeping the prefix off of these endpoints in this issue, but I agree with @FUSAKLA's opinion that it's making a decision for the user.
- I think it's simpler to serve the whole app as specified via the command line flags, and users may block endpoints that they do not want others to access (same as if they don't prefix the routes).
- Prometheus prefixes all endpoints (including /metrics and /debug/*), so this PR makes Thanos consistent with that.
- https://github.com/thanos-io/thanos/issues/3260
Essentially, it prefixes the debug and metrics endpoints with the value of --web.route-prefix if available.
For those like me who have Thanos components behind a reverse proxy (e.g. https://example.com/thanos/query) then this will break their access to some endpoints (e.g. /metrics, /-/ready and /-/healthy) if they hard coded around it like we did (see below).
Bad:
# This will break
location = /thanos/query/metrics {
proxy_pass http://127.0.0.1:10902/metrics;
}
Good:
# This will work
location /thanos/query/ {
proxy_pass http://127.0.0.1:10902/thanos/query/;
}
Verification
Verified with local tests and deployment of a build to our own servers to verify that the endpoints are served as expected both with and without --web.external-prefix (and thereby --web.route-prefix) being set.
Additionally, added more e2e tests to verify this works as expected on the components that support setting a prefix.
I've rerun the failed e2e test TestCompactWithStoreGateway without issue locally without any changes, so I think that might just be flaky. I've been unable to make TestRule succeed, but I don't see anything that I changed that would've affected it.
Thank you @wbh1 for fixing this.
It LGTM, but I also have that feeling about "can't we do this smarter?". Still, looks good and thank you!
I agree that there's likely a better way to do it that involves more refactoring -- maybe with including a webConfig for all components? Something to be addressed in a future PR, perhaps 😛
https://github.com/thanos-io/thanos/blob/83419bc5e3c5f667410a04c1c9920e27c3779162/cmd/thanos/config.go#L156-L161
Hey @wiardvanrij - anything else that you need for this?
Hey @wiardvanrij - anything else that you need for this?
No, still LGTM. I will ping others to get this reviewed/merged :) Thanks for picking this up!
It LGTM, but I also have that feeling about "can't we do this smarter?". Still, looks good and thank you!
I agree that there's likely a better way to do it that involves more refactoring -- maybe with including a
webConfigfor all components? Something to be addressed in a future PR, perhaps 😛https://github.com/thanos-io/thanos/blob/83419bc5e3c5f667410a04c1c9920e27c3779162/cmd/thanos/config.go#L156-L161
I think this makes sense. Would you be interested to write an issue for it and then perhaps implement it? I'm asking for an issue so that it can be (re)viewed before you work a lot on it.
The code change looks good. But this is really a breaking change and might impact our users. cc @thanos-io/thanos-maintainers for review.
Resurfacing this discussion @yeya24 😄 Could you ping again?
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Still relevant. Needs review.
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Still relevant. Needs review.
Will take a look soon, sorry for the delay. Could you please rebase this?
Alright, @GiedriusS -- this should be ready for review now.
I've added additional e2e testing for the components that actually support the prefix (Compact, Query, Rule, and Store), which helped catch bugs in how Store and Compact were handling route prefixes so thank you for that 😁.
Unrelated to the merging of this PR, but related to the topic of it:
In working on this PR, I've realized how each component largely handles the setting up of its routing configuration which makes changes like this difficult. Which leads me to want to make an effort in the future to consolidate the HTTP server/routing configuration for all components to ensure they're all consistent with the way they serve their routes.
Part of this work is already done in how some components use the webConfig struct, and also the prometheus/common/route package (example in Store). This leads to a lot of repeated code (see the similarities in Rule and Query).
Is this an idea that should go through the proposal process? Looking for next steps before starting any effort towards it.
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Not stale. Just need to get around to rebasing it again.
Sorry for the delay, I was on vacation. Will take a look in a bit & respond to you
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.