contour
contour copied to clipboard
Contour support for Envoy's stats per route
Please describe the problem you have At https://github.com/envoyproxy/envoy/issues/3351 @stevesloka advise envoy to expose metrics per vhost, now this feature has been released along with envoy v1.23 as route-stat-prefix (the pr is https://github.com/envoyproxy/envoy/pull/21302), shall we want to support it too?
Any plan for this feature support?
Seems reasonable to support with a few considerations:
- document which stats will be enabled (https://www.envoyproxy.io/docs/envoy/v1.23.0/configuration/http/http_filters/router_filter#config-http-filters-router-vcluster-stats)
- we should document the resource impact this will have to each instance of Envoy (from the Envoy docs
We do not recommend setting up a stat prefix for every application endpoint. This is both not easily maintainable and statistics use a non-trivial amount of memory(approximately 1KiB per route).) - do a bench/load test to see if it has a noticeable impact the way Contour programs routes etc.
- this will mean
- consider if this should be an opt-in feature
We would definitely take community contributions to help speed this up, otherwise we've got this prioritized for 1.24.0 currently
The Contour project currently lacks enough contributors to adequately respond to all Issues.
This bot triages Issues according to the following rules:
- After 60d of inactivity, lifecycle/stale is applied
- After 30d of inactivity since lifecycle/stale was applied, the Issue is closed
You can:
- Mark this Issue as fresh by commenting
- Close this Issue
- Offer to help out with triage
Please send feedback to the #contour channel in the Kubernetes Slack
We are planning to do tests for this feature, update later.
Load test results come:
We did two rounds load test:
- 10k routes without vhost metrics
- 10k routes with vhost metrics
All tests env is 1 instance 4C4G envoy v1.23
Test results:
The 1st round:
Below image show envoy start CPU and memory: CPU 2%, memory 6% (almost 250m)
After sent requests to 10k routes randomly, CPU and memory like below: CPU 400%, memory 8% (almost 350m)

The 2nd round:
Below image show envoy start CPU and memory: CPU 2% - 3%, memory 7.5% (almost 330m)

After sent requests to 10k routes randomly, CPU and memory like below: CPU 400%, memory 10% (almost 450m)

I think the vhost metrics only make envoy start memory high, for load performance, it is fine.
Hope this test can help you make decision.
The Contour project currently lacks enough contributors to adequately respond to all Issues.
This bot triages Issues according to the following rules:
- After 60d of inactivity, lifecycle/stale is applied
- After 30d of inactivity since lifecycle/stale was applied, the Issue is closed
You can:
- Mark this Issue as fresh by commenting
- Close this Issue
- Offer to help out with triage
Please send feedback to the #contour channel in the Kubernetes Slack
Merry Christmas to guys, if any discussion need, let's going on.
The Contour project currently lacks enough contributors to adequately respond to all Issues.
This bot triages Issues according to the following rules:
- After 60d of inactivity, lifecycle/stale is applied
- After 30d of inactivity since lifecycle/stale was applied, the Issue is closed
You can:
- Mark this Issue as fresh by commenting
- Close this Issue
- Offer to help out with triage
Please send feedback to the #contour channel in the Kubernetes Slack
Sorry for the lack of responses on this one @wilsonwu will try to look at this again soon!
Sorry for the lack of responses on this one @wilsonwu will try to look at this again soon!
Thanks Sunjay, if the test result acceptable, we can move on for some design work.
Hi guys, let's going on, @sunjayBhatia, any update for this.
@wilsonwu I'm going to add this to the 1.26 milestone for now and will plan to look at it after 1.25 is released at the end of this month.
@wilsonwu I'm going to add this to the 1.26 milestone for now and will plan to look at it after 1.25 is released at the end of this month.
Good to hear that, we will starting contribute it.
Considering this feature has not been implemented yet, I wonder if there's an alternative option to monitor aggregated traffic of a HTTPProxy/Ingress in contour? envoy metrics show the traffic of each backend pod and I can't see an easy way to relate them to a specified HTTPProxy/Ingress object especially if multiple HTTPProxy/Ingress objects point to the same service/pods
@wilsonwu sorry this is so late but when doing the experiment above, did you use a static stat prefix for all routes associated with a virtualhost or do something similar to what is described here: https://github.com/projectcontour/contour/pull/5535#issuecomment-1634646647 ? Naively I'm thinking a static stat prefix would have less resource impact and also not offer the granularity needed to actually differentiate the stats between different routes on a route/upstream
@wilsonwu sorry this is so late but when doing the experiment above, did you use a static stat prefix for all routes associated with a virtualhost or do something similar to what is described here: #5535 (comment) ? Naively I'm thinking a static stat prefix would have less resource impact and also not offer the granularity needed to actually differentiate the stats between different routes on a route/upstream
Sorry for the late reply, already comment in the PR.
per the ongoing discussion on the related PR, looks like this will slip to 1.27.0