contour
contour copied to clipboard
[Feature Discussion] The ways to extend/custom functionalities for Envoy: WASM vs. Lua vs. External Processing vs. GO filter
This is a summarize and discussion of the ways to extend or custom Envoy functionalities, let's compare below 4 methods, we don't talk about details of the implementation, just compare and choose which we want to make Contour to support.
Comparison:
WASM:
ref. https://github.com/projectcontour/contour/issues/4276 Security: Medium For Envoy, it provide sandbox to make WASM run in an isolation area, the Envoy main process will not be affected.
Extendibility: High Users can implement any logic in WASM and set the WASM run in any stage of Envoy.
Performance: Low The WASM performance is not good, from an Istio expert, the WASM only has 50% performance of Envoy native logic.
Others: WASM will increase the NACKs, because load WASM depends on WASM file and from configuration to load file, there is more chances to NACKs.
Lua:
ref. https://github.com/projectcontour/contour/issues/3006 Security: Low If user write a bad script, Envoy will be affected.
Extendibility: High As same as WASM user can create any logic by Lua.
Performance: Medium / Low From some Istio usage experience, for performance Lua is better then WASM, and from gateway use case, the performance still not good.
External Processing
ref. https://github.com/projectcontour/contour/issues/5038 Security: High Like ExAuth and Traffic Ratelimit, use external processing to extend envoy functionalities is a good way, because there is no logic code running in Envoy, just get response from a external service (like auth server), so if the external service has problem, for Envoy, it works well.
Extendibility: Low For External Processing, must design by the standard, if user want to do anything not related the rules, maybe that can't work.
Performance: High The external service can deploy more replica and assign more resources to improve performance.
GO filter
Security: Low As my understanding the filter of envoy be extended will inject envoy process, if there is any problem, it will affect envoy.
Extendibility: High As same as WASM and Lua, we can code anything we want.
Performance: High For GOLANG the performance is higher than WASM and Lua, but still need to test.
Hey @wilsonwu! Thanks for opening your first issue. We appreciate your contribution and welcome you to our community! We are glad to have you here and to have your input on Contour. You can also join us on our mailing list and in our channel in the Kubernetes Slack Workspace
Vote from me: WASM +1 External Processing +1
Thanks for adding this, a few comments:
Adding the design and implementation efforts for each of these would also be nice.
For example,
- WASM requires significant research and handling of the NACKs before we can move forward.
- ExtProc has similar patterns with rate limiting and ExtAuthz. The design effort won't be as much.
- The Go filter is still in alpha and under development.
Also, for external processing performance, we should consider the added latency the network call adds.
Also, for external processing performance, we should consider the added latency the network call adds.
We I am suprised that something involving an external call is marked higher on performance compared to in process solutions.
That being said I think ext_proc
is the most straightforward way to add envoy extensibility given that we already have the design patterns for it.
The comparison between WASM and Go is harder for me. Both filters have limited use cases and are not "stable". I wonder if we can take the approach of going forward with ext_proc
filter and then graduate to WASM/Go assuming that we have more datapoints from the Envoy community and also better understanding of the latency guarantees that we need. If people adopt it and find it to be slow we can start designing and thinking about in process methods.
@skriss / @sunjayBhatia what are we looking for to make a decision here?
@wilsonwu assuming we knew which way we decided to go forward with, would you be volunteering for the implementation? Is this a high priority item for you/your team?
Also, for external processing performance, we should consider the added latency the network call adds.
We I am suprised that something involving an external call is marked higher on performance compared to in process solutions.
That being said I think
ext_proc
is the most straightforward way to add envoy extensibility given that we already have the design patterns for it.The comparison between WASM and Go is harder for me. Both filters have limited use cases and are not "stable". I wonder if we can take the approach of going forward with
ext_proc
filter and then graduate to WASM/Go assuming that we have more datapoints from the Envoy community and also better understanding of the latency guarantees that we need. If people adopt it and find it to be slow we can start designing and thinking about in process methods.@skriss / @sunjayBhatia what are we looking for to make a decision here?
@wilsonwu assuming we knew which way we decided to go forward with, would you be volunteering for the implementation? Is this a high priority item for you/your team?
Yes, currently the ext_proc
is a better way as Contour extensions, although it still has extensibility limitation, my team and I would like to contribute this, if we have decision making on this.
I'm +1 to all of the above comments re: external processing (well-established design pattern, avoids thorny NACK issues, etc). My biggest concern there is that the filter is not yet stable:
This API feature is currently work-in-progress. API features marked as work-in-progress are not considered stable, are not covered by the threat model, are not supported by the security team, and are subject to breaking changes. Do not use this feature without understanding each of the previous points.
(ref. External Processor docs)
Does anyone have more information on the timeline for the filter moving to a more stable state?
I would think at a minimum, we could start to make progress on design, including thinking through interaction with other existing features and any security considerations, and a spike on functionality. We could also consider whether this is something that we could release in an experimental state behind a feature flag, until the upstream functionality is more stabilized.
If folks are attending KubeCon EU, please come to the Contour ContribFest - would love to have more discussions there!
Does anyone have more information on the timeline for the filter moving to a more stable state?
I will reach out in the Envoy slack channel and talk with the maintainers there to get more info
+1 for exploring the ext prox filter as there seems to be a bit more in the way of guardrails and the operational mechanisms seems simpler
Does anyone have more information on the timeline for the filter moving to a more stable state?
Talking with the current codeowner:
Yeah, the ext_proc API is fairly stable. The implementation is currently in alpha state. More fuzzer work need to be done before it can be changed into stable state.
Yeah, the ext_proc API is fairly stable. The implementation is currently in alpha state. More fuzzer work need to be done before it can be changed into stable state.
Thanks for this info, and if we build the extension feature in Contour based on current version of ext_proc, do you think there is any risk?
The Contour project currently lacks enough contributors to adequately respond to all Issues.
This bot triages Issues according to the following rules:
- After 60d of inactivity, lifecycle/stale is applied
- After 30d of inactivity since lifecycle/stale was applied, the Issue is closed
You can:
- Mark this Issue as fresh by commenting
- Close this Issue
- Offer to help out with triage
Please send feedback to the #contour channel in the Kubernetes Slack
The Contour project currently lacks enough contributors to adequately respond to all Issues.
This bot triages Issues according to the following rules:
- After 60d of inactivity, lifecycle/stale is applied
- After 30d of inactivity since lifecycle/stale was applied, the Issue is closed
You can:
- Mark this Issue as fresh by commenting
- Close this Issue
- Offer to help out with triage
Please send feedback to the #contour channel in the Kubernetes Slack
In Envoy v1.27.0, the unstable warning has been removed: https://www.envoyproxy.io/docs/envoy/v1.27.0/api-v3/extensions/filters/http/ext_proc/v3/ext_proc.proto
I think we can move on for this.
I have some needs to utilize something like this as well. I'd like to help jump start some additional work on this as well. @wilsonwu are you still up for working on some of this? I'm happy to jump in as well, I've been way to quiet on this project and this seems like a (selfishly) useful place I can also help out.
@skriss @sunjayBhatia any hesitation on that approach? Or allow a feature-gate to enable like we've done in the past?
Thanks @wilsonwu for that update, that's great to see.
Hey @stevesloka! Personally I'm happy to move ahead with ExtProc support along the lines of how we have ExtAuthz and Global Rate Limiting implemented, given that it seems to be pretty stable on the Envoy side and we have a well-defined pattern for integrating these auxiliary services. Seems like most interested parties would be happy with that as a step forward as well. If you are willing to drive it, that would definitely help get it done sooner 😀
Thanks @stevesloka and @skriss , happy to hear that, after internel talk, @izturn and I will keep on contributing on this feature, I think in next month we will have a draft design for this, let's keep eyes on it.
@stevesloka @skriss @sunjayBhatia and more, i will put a draft design & implementation next week
@stevesloka @skriss @sunjayBhatia PTAL
@izturn is there a link or branch to look at? Thanks!
@stevesloka #5866 #5867 #5868
Is there any plan to review these changes? My team will really appreciate this feature for our billing purposes in Contour. Also we can help dear @izturn if needed.
The Contour project currently lacks enough contributors to adequately respond to all Issues.
This bot triages Issues according to the following rules:
- After 60d of inactivity, lifecycle/stale is applied
- After 30d of inactivity since lifecycle/stale was applied, the Issue is closed
You can:
- Mark this Issue as fresh by commenting
- Close this Issue
- Offer to help out with triage
Please send feedback to the #contour channel in the Kubernetes Slack
The Contour project currently lacks enough contributors to adequately respond to all Issues.
This bot triages Issues according to the following rules:
- After 60d of inactivity, lifecycle/stale is applied
- After 30d of inactivity since lifecycle/stale was applied, the Issue is closed
You can:
- Mark this Issue as fresh by commenting
- Close this Issue
- Offer to help out with triage
Please send feedback to the #contour channel in the Kubernetes Slack
The Contour project currently lacks enough contributors to adequately respond to all Issues.
This bot triages Issues according to the following rules:
- After 60d of inactivity, lifecycle/stale is applied
- After 30d of inactivity since lifecycle/stale was applied, the Issue is closed
You can:
- Mark this Issue as fresh by commenting
- Close this Issue
- Offer to help out with triage
Please send feedback to the #contour channel in the Kubernetes Slack