contour
contour copied to clipboard
Add envoy stat_prefix support for httpproxy
This pull request addresses issue#4637. I'm not sure if this is a perfect implementation but I thought receiving feedback could help me implement it more properly.
Currently it adds a field in the envoy stanza of contour configuration, which, if set, will also be set in the envoy route config.
Another possible implementation would be to add a field to HTTPProxy spec, allowing users to set different StatPrefix
es per route.
Also, as a third way to implement, we can set resource.name._resource.kind_resource.namespace
as the stat_prefix!
Hi @therealak12! Welcome to our community and thank you for opening your first Pull Request. Someone will review it soon. Thank you for committing to making Contour better. You can also join us on our mailing list and in our channel in the Kubernetes Slack Workspace
Here is a demo of the current implementation. There were 2 httpproxies in the cluster with example.com
and example.com.ir
hosts at the time.
I prefer this to xref: https://github.com/projectcontour/contour/pull/5539
Do we think users will need to configure this per-route stat prefix? Is there any value in making this configurable? If not we can probably get away with just adding a boolean flag and configuring the stat prefix using some details of the Envoy (hashing the route match parameters for example) route to ensure uniqueness/ability to differentiate between different routes
In general this seems like it achieves what #4637 requires, an opt-in ability to signal Envoy to collect per-route upstream stats
This PR does something like what you've explained.
Do we think users will need to configure this per-route stat prefix? Is there any value in making this configurable? If not we can probably get away with just adding a boolean flag and configuring the stat prefix using some details of the Envoy (hashing the route match parameters for example) route to ensure uniqueness/ability to differentiate between different routes In general this seems like it achieves what #4637 requires, an opt-in ability to signal Envoy to collect per-route upstream stats
This PR does something like what you've explained.
I think it would be useful to modify that PR with a generated stat prefix that differentiates between different Envoy routes (maybe in addition to or instead of the existing stat prefix that differentiates between originating resources) and test it out with a couple different HTTPProxies/Ingresses/HTTPRoutes with multiple route rules
(I would guess that using different stat prefixes could possibly expand the memory footprint that was analysed in https://github.com/projectcontour/contour/issues/4637 as well, but would have to confirm to be sure)
cc @izturn @wilsonwu from #4637 is the goal to see stats between different route matches (e.g. see the upstream stats for requests to /foo vs /bar), if so I don't think the existing PRs do so?
@sunjayBhatia Does it look good to you?
Please don't worry about the prefix section for now. It is just a temporary thing that I put in place to make the PoC work. I will improve the String()
method if it is accepted.
These are the stats for an httpproxy with the following spec:
routes:
- conditions:
- prefix: /etc
permitInsecure: true
services:
- name: python-http-server
port: 8000
- conditions:
- prefix: /tmp
permitInsecure: true
services:
- name: python-http-server
port: 8000
virtualhost:
fqdn: some-random-path.apps.private.okd4.ts-2.staging-snappcloud.io
It sets StatPrefix to a string built like this:
fmt.Sprintf("%s_%s_%s_%s", httpRoute.GetNamespace(), "httproute", httpRoute.GetName(), r.PathMatchCondition.String())
this seems more usable and now does separate the stats for different route rules
we'll have to figure out a stat prefix we can generate that can express uniqueness, but also can be used by an operator to figure out what route theyre looking at (e.g. not just an opaque hash)
would love to get @izturn and @wilsonwu's thoughts since they originally requested this
This seems more usable and now does separate the stats for different route rules
We'll have to figure out a stat prefix we can generate that can express uniqueness, but also can be used by an operator to figure out what route they're looking at (e.g. not just an opaque hash)
would love to get @izturn and @wilsonwu's thoughts since they originally requested this
We had a conversation about the hash which I've replied here: https://kubernetes.slack.com/archives/C8XRH2R4J/p1689159578352139?thread_ts=1688670204.661949&cid=C8XRH2R4J
It seems the 60 characters limit can be safely removed now, and then we'll have the full hostname available in the stats.
One point to add here,
It's not that trivial to include the matchCondition
in the stats.
There are various cases such as regex, exact, and prefix match. To distinguish between two routes, one having exactMatch
set to /sample
and the other having prefix
set to /sample
we should include the type of the match condition in the stat too.
Besides, there are query and header match conditions too.
A possible format for the stat_prefix would be:
namespace_ingressResourceType_ingressResourceName_matchType_matchValue
where matchType
can be one of regex, prefix, or exact.
Although this supports all 3 match types, it doesn't count headers or query matches in. Trying to put those in the prefix makes it too complex and even useless.
It would be nice if we agree on a format before going on.
It's not that trivial to include the
matchCondition
in the stats.
Yep agreed, especially as there can be multiple match conditions in a route and these conditions can be complex to serialize into a string that isnt incredibly long
Some notes for discussion:
- We could generate a name (based on the originating k8s resource name/route index or a hash based on match conditions) for each Envoy Route (see field here) which we do not do today and also use this as a stat prefix, which would mean users could look this up in their HTTPProxy/Ingress or Envoy config dump to coordinate with to see which stats line up with what
- This sort of goes against Contours goal to abstract Envoy-isms from the user, but we could make the Route stat prefix configurable in HTTPProxy (this means Ingress/Gateway API HTTPRoute don't have this option)
- It really depends what users actually will find useful, do we need the granularity of per-Route rule stats or is it enough to know what HTTPProxy/Ingress/HTTPRoute stats may come from?
@therealak12 , thx for your pr, i am ok with it
@therealak12 , thx for your pr, i am ok with it
can you clarify a bit more @izturn how do you intend to use the new stats? specifically around the last bullet here: https://github.com/projectcontour/contour/pull/5535#issuecomment-1638789265
Sorry for the late reply, I think per route stats is enough for our current situation, if some use cases for per virtualhost, just calculated by metrics system is OK.
@sunjayBhatia Sorry for the late reply, As @wilsonwu said, I think per route stats is enough for our use case
As @sunjayBhatia pointed above, the per-route metrics are located at the highest level of the granularity pyramid:
do we need the granularity of per-Route rule stats
So let's discuss whether per-virtualhost metrics is enough or if we should agree on a format for presenting per-route metrics. In our use case, we are happy with only including the prefix
from matchCondition
s. In general, there are several matchConditions
that can differentiate routes. Including all of them results in excessively lengthy metric names.
Your valuable comment will be crucial in reaching the best decision. Thanks in advance.
@wilsonwu @izturn
As @sunjayBhatia pointed above, the per-route metrics are located at the highest level of the granularity pyramid:
do we need the granularity of per-Route rule stats
So let's discuss whether per-virtualhost metrics is enough or if we should agree on a format for presenting per-route metrics. In our use case, we are happy with only including the
prefix
frommatchCondition
s. In general, there are severalmatchConditions
that can differentiate routes. Including all of them results in excessively lengthy metric names.Your valuable comment will be crucial in reaching the best decision. Thanks in advance.
@wilsonwu @izturn
Thanks for these information, I our use case, we make envoy as an API gateway, so users pay more attention to the API, and as my early reply I thnk if we have per route metrics, the per virtualhost and per envoy metrics can be caculated be themselves, anyway, support per virtualhost metrics directly is better.
@sunjayBhatia Based on the discussion, it seems providing per-virtual-host metrics is enough. I'd like to hear your thoughts on this.
@sunjayBhatia Based on the discussion, it seems providing per-virtual-host metrics is enough. I'd like to hear your thoughts on this.
to summarize:
- it looks like the "per route" metrics that are important to @wilsonwu /@izturn are in fact actually "per vhost", so aggregating request metrics needs to be done with that in mind when making up the stat prefix
- the new stats will be rooted at
vhost.<virtual host name>.route.<stat_prefix>
so a staticstat_prefix
seems sufficient, since we don't need to add any further granularity - to be conservative and make this an additive change, how users opt-in to these new stats could be a boolean or letting users configure the route
stat_prefix
, we can discuss the merits of either approach but I'm leaning towards a bool opt-in to require less validation etc.- furthermore, is it sufficient to have this as a global opt-in or per HTTPProxy? I would argue for it to be a global config since this is an operator-level configuration that could affect Envoy's performance
- as an aside, if we did want to add some additional granularity we could (generate the stat prefix based on the resource that originated the Route for example), and users who don't care about that additional detail can maybe aggregate the statistic themselves (though we would have to think about that UX in more detail)
The Contour project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 14d of inactivity, lifecycle/stale is applied
- After 30d of inactivity since lifecycle/stale was applied, the PR is closed
You can:
- Mark this PR as fresh by commenting or pushing a commit
- Close this PR
- Offer to help out with triage
Please send feedback to the #contour channel in the Kubernetes Slack
hey @therealak12, thx 4 ur work, any progress?
hey @therealak12, thx 4 ur work, any progress?
Hey. I'm busy these days. I'll update this thread soon.
We need to agree on some design decisions first.
-
I agree with having a boolean to enable/disable the
stat_prefix
. However, I think a global one is better than adding a per-resource field. The reason is that it fits our requirements better. This is the first thing to decide. -
Also, I think including the name of the resource in the
stat_prefix
is fine and better than not doing that. This is the second decision.
If these decisions are accepted, I'll update this MR or create a new one.
hey @therealak12, thx 4 ur work, any progress?
Hey. I'm busy these days. I'll update this thread soon.
We need to agree on some design decisions first.
- I agree with having a boolean to enable/disable the
stat_prefix
. However, I think a global one is better than adding a per-resource field. The reason is that it fits our requirements better. This is the first thing to decide.- Also, I think including the name of the resource in the
stat_prefix
is fine and better than not doing that. This is the second decision.If these decisions are accepted, I'll update this MR or create a new one.
- Agree the global enable/disable for
stat_prefix
. - And for include a resource name or not, I think currently the path is OK, no more voice for this requirement, I am fine for it.
The Contour project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 14d of inactivity, lifecycle/stale is applied
- After 30d of inactivity since lifecycle/stale was applied, the PR is closed
You can:
- Mark this PR as fresh by commenting or pushing a commit
- Close this PR
- Offer to help out with triage
Please send feedback to the #contour channel in the Kubernetes Slack
So I've just (force) pushed a new commit. If the enableStatPrefix
configuration is true, it sets stat_prefix
to <resource_namespace>_<resource_name>
.
This actually resembles the idea implemented in pr#5539. But as it was closed, I preferred to rewrite the changes in this PR.
The Contour project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 14d of inactivity, lifecycle/stale is applied
- After 30d of inactivity since lifecycle/stale was applied, the PR is closed
You can:
- Mark this PR as fresh by commenting or pushing a commit
- Close this PR
- Offer to help out with triage
Please send feedback to the #contour channel in the Kubernetes Slack
The Contour project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 14d of inactivity, lifecycle/stale is applied
- After 30d of inactivity since lifecycle/stale was applied, the PR is closed
You can:
- Mark this PR as fresh by commenting or pushing a commit
- Close this PR
- Offer to help out with triage
Please send feedback to the #contour channel in the Kubernetes Slack
@therealak12 any progress?
@therealak12 any progress?
This is actually ready for review. I'll resolve merge conflicts soon.
Codecov Report
Attention: Patch coverage is 20.00000%
with 20 lines
in your changes are missing coverage. Please review.
Project coverage is 78.62%. Comparing base (
89d2856
) to head (fdf3ab4
). Report is 227 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #5535 +/- ##
==========================================
- Coverage 78.70% 78.62% -0.08%
==========================================
Files 138 138
Lines 19717 19742 +25
==========================================
+ Hits 15518 15523 +5
- Misses 3895 3909 +14
- Partials 304 310 +6
Files | Coverage Δ | |
---|---|---|
cmd/contour/servecontext.go | 83.88% <100.00%> (+0.03%) |
:arrow_up: |
internal/contourconfig/contourconfiguration.go | 98.19% <100.00%> (+0.01%) |
:arrow_up: |
internal/dag/dag.go | 98.70% <ø> (ø) |
|
pkg/config/parameters.go | 86.43% <100.00%> (+0.04%) |
:arrow_up: |
cmd/contour/serve.go | 19.62% <50.00%> (+0.12%) |
:arrow_up: |
internal/dag/ingress_processor.go | 96.44% <0.00%> (-1.16%) |
:arrow_down: |
internal/envoy/v3/route.go | 80.17% <0.00%> (-0.31%) |
:arrow_down: |
internal/dag/gatewayapi_processor.go | 93.38% <0.00%> (-0.36%) |
:arrow_down: |
internal/dag/httpproxy_processor.go | 91.54% <0.00%> (-0.38%) |
:arrow_down: |
It's ready now. @izturn