thanos icon indicating copy to clipboard operation
thanos copied to clipboard

.*: Apply webrouteprefix on debug and metric endpoints

Open wbh1 opened this issue 4 years ago • 18 comments
trafficstars

  • [x] I added CHANGELOG entry for this change.
  • [ ] Change is not relevant to the end user.

Fixes https://github.com/thanos-io/thanos/issues/2727

Changes

This is a continuation of the work started by @ranjithkumar007 in https://github.com/thanos-io/thanos/pull/2524.

I've completed the rebase, fixed prefix handling in Thanos Store, added prefix handling logic to Thanos Compact (the flags were there but ignored, previously), and added additional testing around prefix behaviors in e2e tests.

Related issues:

  • https://github.com/thanos-io/thanos/issues/2486
  • https://github.com/thanos-io/thanos/issues/1700
    • Note: @bwplotka advocated for keeping the prefix off of these endpoints in this issue, but I agree with @FUSAKLA's opinion that it's making a decision for the user.
    • I think it's simpler to serve the whole app as specified via the command line flags, and users may block endpoints that they do not want others to access (same as if they don't prefix the routes).
    • Prometheus prefixes all endpoints (including /metrics and /debug/*), so this PR makes Thanos consistent with that.
  • https://github.com/thanos-io/thanos/issues/3260

Essentially, it prefixes the debug and metrics endpoints with the value of --web.route-prefix if available.

For those like me who have Thanos components behind a reverse proxy (e.g. https://example.com/thanos/query) then this will break their access to some endpoints (e.g. /metrics, /-/ready and /-/healthy) if they hard coded around it like we did (see below).

Bad:

   # This will break
    location = /thanos/query/metrics {
       proxy_pass http://127.0.0.1:10902/metrics;
    }

Good:

   # This will work
    location /thanos/query/ {
        proxy_pass http://127.0.0.1:10902/thanos/query/;
    }

Verification

Verified with local tests and deployment of a build to our own servers to verify that the endpoints are served as expected both with and without --web.external-prefix (and thereby --web.route-prefix) being set.

Additionally, added more e2e tests to verify this works as expected on the components that support setting a prefix.

wbh1 avatar Jul 22 '21 18:07 wbh1

I've rerun the failed e2e test TestCompactWithStoreGateway without issue locally without any changes, so I think that might just be flaky. I've been unable to make TestRule succeed, but I don't see anything that I changed that would've affected it.

wbh1 avatar Jul 22 '21 21:07 wbh1

Thank you @wbh1 for fixing this.

ranjithkumar007 avatar Jul 23 '21 05:07 ranjithkumar007

It LGTM, but I also have that feeling about "can't we do this smarter?". Still, looks good and thank you!

I agree that there's likely a better way to do it that involves more refactoring -- maybe with including a webConfig for all components? Something to be addressed in a future PR, perhaps 😛

https://github.com/thanos-io/thanos/blob/83419bc5e3c5f667410a04c1c9920e27c3779162/cmd/thanos/config.go#L156-L161

wbh1 avatar Jul 28 '21 13:07 wbh1

Hey @wiardvanrij - anything else that you need for this?

wbh1 avatar Aug 04 '21 15:08 wbh1

Hey @wiardvanrij - anything else that you need for this?

No, still LGTM. I will ping others to get this reviewed/merged :) Thanks for picking this up!

wiardvanrij avatar Aug 04 '21 19:08 wiardvanrij

It LGTM, but I also have that feeling about "can't we do this smarter?". Still, looks good and thank you!

I agree that there's likely a better way to do it that involves more refactoring -- maybe with including a webConfig for all components? Something to be addressed in a future PR, perhaps 😛

https://github.com/thanos-io/thanos/blob/83419bc5e3c5f667410a04c1c9920e27c3779162/cmd/thanos/config.go#L156-L161

I think this makes sense. Would you be interested to write an issue for it and then perhaps implement it? I'm asking for an issue so that it can be (re)viewed before you work a lot on it.

wiardvanrij avatar Aug 04 '21 19:08 wiardvanrij

The code change looks good. But this is really a breaking change and might impact our users. cc @thanos-io/thanos-maintainers for review.

yeya24 avatar Aug 05 '21 06:08 yeya24

Resurfacing this discussion @yeya24 😄 Could you ping again?

wbh1 avatar Sep 10 '21 17:09 wbh1

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] avatar Nov 25 '21 21:11 stale[bot]

Still relevant. Needs review.

wbh1 avatar Nov 25 '21 22:11 wbh1

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] avatar Mar 02 '22 10:03 stale[bot]

Still relevant. Needs review.

wbh1 avatar Mar 14 '22 13:03 wbh1

Will take a look soon, sorry for the delay. Could you please rebase this?

GiedriusS avatar Apr 20 '22 12:04 GiedriusS

Alright, @GiedriusS -- this should be ready for review now.

I've added additional e2e testing for the components that actually support the prefix (Compact, Query, Rule, and Store), which helped catch bugs in how Store and Compact were handling route prefixes so thank you for that 😁.


Unrelated to the merging of this PR, but related to the topic of it:

In working on this PR, I've realized how each component largely handles the setting up of its routing configuration which makes changes like this difficult. Which leads me to want to make an effort in the future to consolidate the HTTP server/routing configuration for all components to ensure they're all consistent with the way they serve their routes.

Part of this work is already done in how some components use the webConfig struct, and also the prometheus/common/route package (example in Store). This leads to a lot of repeated code (see the similarities in Rule and Query).

Is this an idea that should go through the proposal process? Looking for next steps before starting any effort towards it.

wbh1 avatar May 09 '22 14:05 wbh1

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] avatar Jul 10 '22 11:07 stale[bot]

Not stale. Just need to get around to rebasing it again.

wbh1 avatar Jul 11 '22 16:07 wbh1

Sorry for the delay, I was on vacation. Will take a look in a bit & respond to you

GiedriusS avatar Jul 11 '22 17:07 GiedriusS

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] avatar Sep 21 '22 00:09 stale[bot]