flagger
flagger copied to clipboard
Add support for Knative
This is a proof-of-concept pull request to add the canary release deployment strategy for Knative (https://github.com/fluxcd/flagger/issues/903). It's far from production ready but I wanted to see if there's appetite for this feature before I invest more time into completing it.
Currently Flagger can target a Knative service & complete the canary release process. I've not tested rollbacks yet although I think it should just work. The pull request needs some extra work to add test coverage; throw errors when using unsupported release processes & add Knative specific Kubernetes events.
Let me know what you think!
Example
The following canary will target a Knative Service. Once the canary has been initialised you can start the canary release process by creating a new revision of the service.
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: example
spec:
targetRef:
apiVersion: serving.knative.dev/v1
kind: Service
name: example
service:
port: 3000
analysis:
interval: 1m
threshold: 10
maxWeight: 50
stepWeight: 5
(I haven't used flagger / knative in awhile but) I'm very excited to see this taking shape!
Hi @tombanksme! This looks great, are there any news on this PR?
I haven't heard anything back from the Flagger maintainers. I would be happy to finish up the PR if it's something that fluxcd would consider merging.
while i see the value added by this PR, is it not possible to use Gateway API as a bridge to get Flagger and Knative to work together: https://github.com/knative-extensions/net-gateway-api
Sorry for the delay. It looks like net-gateway-api isn't ready for production yet according to the readme
@aryan9600
I'll help craft a detailed technical response explaining why the Gateway API approach wouldn't work for Flagger-Knative integration:
Thank you for bringing this up. As someone familiar with Knative, I'd like to clarify why using Gateway API as a bridge between Flagger and Knative wouldn't be possible, and why the current PR approach is actually the correct solution.
The net-gateway-api project has a different purpose - it's not meant to be an control interface for Knative, but rather it allows Knative to use Gateway API as its final networking output. Let me explain how Knative's networking architecture works:
- Knative uses a strictly defined traffic flow architecture where:
- Traffic control is managed through KService resources
- The networking layer uses KIngress resources to configure the Ingress Gateway
- The Ingress Gateway then routes requests either to the activator or directly to Knative Service Pods
-
To integrate with Knative properly, external tools must interact through the KService API - this is the only supported entry point for managing Knative services and their traffic patterns.
-
The net-gateway-api project's scope is specifically about allowing Knative to output Gateway API resources instead of other ingress types - it's about the implementation layer, not the control layer.
Therefore, while the suggestion to use Gateway API as a bridge is interesting, it wouldn't provide the deep integration needed for proper traffic management. The current PR's approach of integrating directly with KService is actually the only correct way to achieve this integration - there are no alternative approaches in the current Knative roadmap that would provide the same level of proper integration.
I've reviewed the implementation in the PR, and it aligns perfectly with Knative's architectural principles and control patterns.
Reference: https://knative.dev/docs/serving/architecture/#traffic-flow-and-dns
i did a quick first pass and the approach looks good. just to confirm, both the user and flagger own the Knative Service object, with the former owning the workload configuration and the latter owning the traffic configuration?
furthermore, can you add some preliminary docs? or share the steps you took to test this change?
Yes, We share one Knative Service object.
Morning folks. Thank you all for taking a look & investing your time into this. I'll allocate some time over the next couple of weeks to get this cleaned up as promised.
Re-opening after a minor Git oopsie on my end
Bump up
I would be grateful if you could take a look at this PR related to Knative when you have a spare moment. 🙏 https://github.com/fluxcd/flagger/pull/1750
Codecov Report
Attention: Patch coverage is 41.49660% with 172 lines in your changes missing coverage. Please review.
Project coverage is 39.44%. Comparing base (
2c4b7a6) to head (12ee6cb). Report is 4 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #1682 +/- ##
==========================================
+ Coverage 39.42% 39.44% +0.01%
==========================================
Files 284 287 +3
Lines 22422 22706 +284
==========================================
+ Hits 8840 8956 +116
- Misses 12632 12777 +145
- Partials 950 973 +23
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.