loki
loki copied to clipboard
promtail/server: Disable profiling by default
What this PR does / why we need it:
Add a server.debug option (that defaults to false) to enable the
/debug/fgprof endpoint in Promtail.
Which issue(s) this PR fixes: Fixes #5005
Checklist
- [x] Documentation added
- [ ] Tests updated
- [x] Add an entry in the
CHANGELOG.mdabout the changes.
Given that profiling is always on at the moment and we make use of it ourselves I think it might be better to default to enabled and allow disabling via the config option you're adding here.
So I can rename the option. Personally, I prefer simply profiling, as the _enabled suffix doesn't really carry much information. But I don't feel too strongly about it, so let me know what you prefer.
As for the default value, I'm surprised that there seems to be a majority of people arguing to keep it enabled by default. It seems very counter-intuitive to me that a software would run with debugging endpoints enabled by default. This is almost certainly not what most people want to do in production, and the principle of least surprise should apply here. For use cases where it's desirable to have them enabled, it can be done explicitly.
Finally, I can modify this PR to also disable profiling in Loki, but I don't know about all the other Cortex or Tempo projects that may be affected as well.
As for the default value, I'm surprised that there seems to be a majority of people arguing to keep it enabled by default. It seems very counter-intuitive to me that a software would run with debugging endpoints enabled by default. This is almost certainly not what most people want to do in production, and the principle of least surprise should apply here. For use cases where it's desirable to have them enabled, it can be done explicitly.
IMO, when running a distributed system like Loki there are issues that occur rarely, and having to explicitly enable instrumentation and then wait for that issue to pop up again in order to have the information you'll need in order to debug and diagnose is counter-intuitive.
I think there is a bit of difference between server applications like Loki and Tempo, and promtail which is styled as an agent. You know you need to protect your servers from unwanted incoming calls; this may not be obvious for an agent.
BTW I am a maintainer of weaveworks/common and can look at a PR to make profiling optional in the “server” package there.
I think there is a bit of difference between server applications like Loki and Tempo, and promtail which is styled as an agent. You know you need to protect your servers from unwanted incoming calls; this may not be obvious for an agent.
That's very true! In our case, Loki is behind an authenticating proxy, so the /debug/* endpoints are not accessible anyway, but promtail is deployed pretty much everywhere so it's quite impractical to restrict access to those endpoints systematically.
BTW I am a maintainer of
weaveworks/commonand can look at a PR to make profiling optional in the “server” package there.
Are you saying you'd be open to PR submissions, or did you want to take a stab at it yourself?
Are you saying you'd be open to PR submissions, or did you want to take a stab at it yourself?
The former.
Somehow during all of this I missed that this PR is for promtail.
I would still say that we should keep the default as profiling on for other components (loki microservices) but feel less strongly about it for promtail itself.
Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.
We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.
Stalebots are also emotionless and cruel and can close issues which are still very relevant.
If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.
We regularly sort for closed issues which have a stale label sorted by thumbs up.
We may also:
- Mark issues as
revivableif we think it's a valid issue but isn't something we are likely to prioritize in the future (the issue will still remain closed). - Add a
keepalivelabel to silence the stalebot if the issue is very common/popular/important.
We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.
I think with an additional commit to fix the changelog conflict we could merge this?
@cstyan I rebased on main, so the conflict should be fixed now.
Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.
We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.
Stalebots are also emotionless and cruel and can close issues which are still very relevant.
If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.
We regularly sort for closed issues which have a stale label sorted by thumbs up.
We may also:
- Mark issues as
revivableif we think it's a valid issue but isn't something we are likely to prioritize in the future (the issue will still remain closed). - Add a
keepalivelabel to silence the stalebot if the issue is very common/popular/important.
We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.
please keep it open
I've rebased again; could someone merge it, or is there anything else?
@chaudum @cyriltovena Could you take a look? It would be great to get this merged soon.
Seems that we agreed that we want to disable the profiling endpoints by default, so basically good to go.
However, regarding the configuration parameter naming, I would want to change it to profiling_enabled, with the _enabled suffix. I'm not sure if we are 100% consistent, but most of the config options that enable/disable stuff have the suffix as well.
Then I'm happy to merge the PR :)
@BenoitKnecht any updates? Would you be able to make those last few changes requested by @chaudum? Thanks!
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki
Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.
+ ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0%
@trevorwhitney Sorry for the huge delay. I finally got around to make the requested changes. Let me know if there's anything else.
I've rebased once again. Can we get this merged before it diverges again?
./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki
Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.
+ ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0.1%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0%
@trevorwhitney There's been two releases of Loki since you merged this into main, but somehow this change isn't part of either of them. Is there something I need to do to get it pulled into the next release?
Two patch releases. It will be in the next minor release.
@BenoitKnecht FYI https://github.com/grafana/loki/issues/8797, looking like the end of next week, or early the week after for release.
@BenoitKnecht FYI #8797, looking like the end of next week, or early the week after for release.
Hey @trevorwhitney, that's great news, thanks for letting me know!