Don't scrape pods via HTTP if the activator in path.
When the activator is in the routing path, scraping the pods is kinda useless as their information is effectively zeroed out, because all requests are considered "proxied", thus the activator metrics take precedence. We can use that to make our system more efficient in these cases!
I propose two steps to go about this:
- Never scrape if TBC=-1. The activator will always be in path in this case, so we never need to scrape really.
- Start/Stop scraping if activator is in path or not. This is a little more tricky as we need to make sure that we start scraping timely once we notice the activator will be taken off the path.
As the PodAutoscaler owns all of these decisions, this should be doable for us. As it's a good-first-issue, I'd love to see it land in two separate pieces as laid out though.
Note: I'm happy to mentor first-time contributors here. The changes needed should be fairly small in terms of number of files to touch.
@markusthoemmes: This request has been marked as suitable for new contributors.
Please ensure the request meets the requirements listed here.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.
In response to this:
When the activator is in the routing path, scraping the pods is kinda useless as their information is effectively zeroed out, because all requests are considered "proxied", thus the activator metrics take precedence. We can use that to make our system more efficient in these cases!
I propose two steps to go about this:
- Never scrape if TBC=-1. The activator will always be in path in this case, so we never need to scrape really.
- Start/Stop scraping if activator is in path or not. This is a little more tricky as we need to make sure that we start scraping timely once we notice the activator will be taken off the path.
As the PodAutoscaler owns all of these decisions, this should be doable for us. As it's a good-first-issue, I'd love to see it land in two separate pieces as laid out though.
Note: I'm happy to mentor first-time contributors here. The changes needed should be fairly small in terms of number of files to touch.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
I'd like to give it a try to look into Serving internals a bit more. /assign dsimansk
/cc @taragu BTW, this is also one of the ways you can improve GSD, if we extend the activator reporting capabilities (not necessarily a good idea, but something to consider).
The rest of this is not a good first issue anymore and requires a fair bit of caution. I'm happy to help people through it, but it's much more involved. Also kinda blocked on #8377 to refactor the scraper in a way to make controlling it's lifetime more predictable.
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.
/lifecycle frozen
Probably still want this
/triage accepted
/assign
Related PR: https://github.com/knative/serving/pull/13027
If there are no ready activators but ready pod endpoints requests will bypass the activator even if TBC=-1
/unassign @nader-ziada