flagger icon indicating copy to clipboard operation
flagger copied to clipboard

Add support for Knative

Open tombanksme opened this issue 1 year ago • 4 comments
trafficstars

This is a proof-of-concept pull request to add the canary release deployment strategy for Knative (https://github.com/fluxcd/flagger/issues/903). It's far from production ready but I wanted to see if there's appetite for this feature before I invest more time into completing it.

Currently Flagger can target a Knative service & complete the canary release process. I've not tested rollbacks yet although I think it should just work. The pull request needs some extra work to add test coverage; throw errors when using unsupported release processes & add Knative specific Kubernetes events.

Let me know what you think!

Example

The following canary will target a Knative Service. Once the canary has been initialised you can start the canary release process by creating a new revision of the service.

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: example
spec:
  targetRef:
    apiVersion: serving.knative.dev/v1
    kind: Service
    name: example
  service:
    port: 3000
  analysis:
    interval: 1m
    threshold: 10
    maxWeight: 50
    stepWeight: 5

tombanksme avatar Jul 15 '24 12:07 tombanksme

(I haven't used flagger / knative in awhile but) I'm very excited to see this taking shape!

edude03 avatar Sep 03 '24 14:09 edude03

Hi @tombanksme! This looks great, are there any news on this PR?

arubino avatar Oct 04 '24 15:10 arubino

I haven't heard anything back from the Flagger maintainers. I would be happy to finish up the PR if it's something that fluxcd would consider merging.

tombanksme avatar Oct 04 '24 15:10 tombanksme

while i see the value added by this PR, is it not possible to use Gateway API as a bridge to get Flagger and Knative to work together: https://github.com/knative-extensions/net-gateway-api

aryan9600 avatar Oct 06 '24 09:10 aryan9600

Sorry for the delay. It looks like net-gateway-api isn't ready for production yet according to the readme

tombanksme avatar Nov 27 '24 14:11 tombanksme

@aryan9600

I'll help craft a detailed technical response explaining why the Gateway API approach wouldn't work for Flagger-Knative integration:

Thank you for bringing this up. As someone familiar with Knative, I'd like to clarify why using Gateway API as a bridge between Flagger and Knative wouldn't be possible, and why the current PR approach is actually the correct solution.

The net-gateway-api project has a different purpose - it's not meant to be an control interface for Knative, but rather it allows Knative to use Gateway API as its final networking output. Let me explain how Knative's networking architecture works:

  1. Knative uses a strictly defined traffic flow architecture where:
  • Traffic control is managed through KService resources
  • The networking layer uses KIngress resources to configure the Ingress Gateway
  • The Ingress Gateway then routes requests either to the activator or directly to Knative Service Pods

image

  1. To integrate with Knative properly, external tools must interact through the KService API - this is the only supported entry point for managing Knative services and their traffic patterns.

  2. The net-gateway-api project's scope is specifically about allowing Knative to output Gateway API resources instead of other ingress types - it's about the implementation layer, not the control layer.

Therefore, while the suggestion to use Gateway API as a bridge is interesting, it wouldn't provide the deep integration needed for proper traffic management. The current PR's approach of integrating directly with KService is actually the only correct way to achieve this integration - there are no alternative approaches in the current Knative roadmap that would provide the same level of proper integration.

I've reviewed the implementation in the PR, and it aligns perfectly with Knative's architectural principles and control patterns.

Reference: https://knative.dev/docs/serving/architecture/#traffic-flow-and-dns

kahirokunn avatar Dec 12 '24 01:12 kahirokunn

i did a quick first pass and the approach looks good. just to confirm, both the user and flagger own the Knative Service object, with the former owning the workload configuration and the latter owning the traffic configuration?

furthermore, can you add some preliminary docs? or share the steps you took to test this change?

aryan9600 avatar Jan 13 '25 10:01 aryan9600

Yes, We share one Knative Service object.

kahirokunn avatar Jan 13 '25 12:01 kahirokunn

Morning folks. Thank you all for taking a look & investing your time into this. I'll allocate some time over the next couple of weeks to get this cleaned up as promised.

tombanksme avatar Jan 14 '25 07:01 tombanksme

Re-opening after a minor Git oopsie on my end

tombanksme avatar Jan 31 '25 16:01 tombanksme

Bump up

kahirokunn avatar Feb 23 '25 03:02 kahirokunn

I would be grateful if you could take a look at this PR related to Knative when you have a spare moment. 🙏 https://github.com/fluxcd/flagger/pull/1750

kahirokunn avatar Mar 17 '25 08:03 kahirokunn

Codecov Report

Attention: Patch coverage is 41.49660% with 172 lines in your changes missing coverage. Please review.

Project coverage is 39.44%. Comparing base (2c4b7a6) to head (12ee6cb). Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
pkg/canary/knative_controller.go 32.35% 64 Missing and 5 partials :warning:
pkg/router/knative.go 53.33% 34 Missing and 8 partials :warning:
pkg/controller/scheduler_metrics.go 29.72% 20 Missing and 6 partials :warning:
pkg/canary/factory.go 0.00% 12 Missing :warning:
pkg/metrics/observers/knative.go 42.85% 8 Missing and 4 partials :warning:
pkg/router/factory.go 0.00% 5 Missing :warning:
pkg/metrics/observers/factory.go 0.00% 4 Missing :warning:
pkg/controller/controller.go 93.75% 1 Missing :warning:
pkg/controller/finalizer.go 0.00% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1682      +/-   ##
==========================================
+ Coverage   39.42%   39.44%   +0.01%     
==========================================
  Files         284      287       +3     
  Lines       22422    22706     +284     
==========================================
+ Hits         8840     8956     +116     
- Misses      12632    12777     +145     
- Partials      950      973      +23     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Mar 17 '25 19:03 codecov-commenter