flagger icon indicating copy to clipboard operation
flagger copied to clipboard

feat(gateway-api): Add custom backendRef and filters support for HTTPRoute

Open kahirokunn opened this issue 11 months ago • 8 comments

Description

This PR adds support for custom backend references in Flagger's primary and canary services. This enhancement allows users to specify different routing configurations and intermediate services for primary and canary traffic, enabling more complex deployment patterns and better integration with existing infrastructure.

Key Changes

  • Added backendRef and filters to spec.service.canary and spec.service.primary
  • Updated Gateway API router to support custom backend references
  • Modified service reconciliation logic to handle custom backend configurations
  • Added support for service-specific filters
  • Maintained backward compatibility with existing configurations

Use Cases

This feature enables several important scenarios:

  1. Routing through security proxies
  2. Adding service-specific monitoring
  3. Implementing different circuit breaker configurations
  4. Supporting complex mesh architectures
  5. Applying different filtering rules for primary and canary traffic

Example Configuration

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: my-app
spec:
  service:
    primary:
      filters:
        - type: RequestHeaderModifier
          requestHeaderModifier:
            set:
              - name: x-route
                value: primary
    canary:
      backendRef:
        name: canary-proxy
        namespace: monitoring
        port: 3456
      filters:
        - type: RequestHeaderModifier
          requestHeaderModifier:
            set:
              - name: x-route
                value: canary

Breaking Changes

None. This is a backward-compatible change that maintains existing behavior when custom backend references are not specified.

Additional Context

This change also allows for cases where different backends are referenced for canary and primary, as shown in the attached image. For details, please refer to the following PR. https://github.com/fluxcd/flagger/pull/1714

Issue

https://github.com/fluxcd/flagger/issues/1741

TODO

  • [ ] add finalize ReferenceGrants

kahirokunn avatar Dec 13 '24 00:12 kahirokunn

Codecov Report

:x: Patch coverage is 32.68156% with 241 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 30.27%. Comparing base (12ee6cb) to head (a2c28be). :warning: Report is 55 commits behind head on main.

Files with missing lines Patch % Lines
...g/apis/gatewayapi/v1beta1/zz_generated.deepcopy.go 0.00% 87 Missing :warning:
pkg/router/gateway_api.go 68.82% 43 Missing and 10 partials :warning:
pkg/apis/flagger/v1beta1/zz_generated.deepcopy.go 0.00% 31 Missing :warning:
...ernalversions/gatewayapi/v1beta1/referencegrant.go 0.00% 31 Missing :warning:
...ped/gatewayapi/v1beta1/fake/fake_referencegrant.go 0.00% 16 Missing :warning:
...rsioned/typed/gatewayapi/v1beta1/referencegrant.go 0.00% 9 Missing :warning:
...lient/listers/gatewayapi/v1beta1/referencegrant.go 0.00% 4 Missing :warning:
pkg/apis/gatewayapi/v1beta1/register.go 0.00% 2 Missing :warning:
.../gatewayapi/v1beta1/fake/fake_gatewayapi_client.go 0.00% 2 Missing :warning:
...oned/typed/gatewayapi/v1beta1/gatewayapi_client.go 0.00% 2 Missing :warning:
... and 2 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1742      +/-   ##
==========================================
- Coverage   39.44%   30.27%   -9.17%     
==========================================
  Files         287      291       +4     
  Lines       22706    22374     -332     
==========================================
- Hits         8956     6774    -2182     
- Misses      12777    14867    +2090     
+ Partials      973      733     -240     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Dec 16 '24 02:12 codecov-commenter

Dear @stefanprodan

I hope this message finds you well.

I am reaching out to request your review on an enhancement I am working on for the Flagger gateway API. My goal is to support the integration of Envoy Gateway with KEDA HTTPScaledObjects through this enhancement.

I would greatly appreciate your feedback and insights on this matter.

Thank you for your time and consideration.

Best regards, kahirokunn

kahirokunn avatar Dec 17 '24 04:12 kahirokunn

thank you for this PR @kahirokunn! have you tested how this change behaves when performing a canary rollout with session affinity enabled? that code also makes use of backend specific filters, so its important to verify that any userland configuration will not break that feature.

aryan9600 avatar Jan 13 '25 06:01 aryan9600

Thank you so much for your feedback regarding session affinity! I will do my best to verify that these changes won’t break any existing session affinity behavior. However, to avoid any misunderstanding or missing test scenarios, would you mind sharing a bit more detail on the specific cases or concerns you have in mind about backend-specific filters and userland configurations? Your insights would be really helpful, and I appreciate your cooperation.

kahirokunn avatar Jan 13 '25 07:01 kahirokunn

i'd recommend following the tutorial in the docs and seeing if the behaviour is as expected (in terms of request-response and how the HTTPRoute definition looks like)

aryan9600 avatar Feb 08 '25 15:02 aryan9600

Hello,

Following your recommendation, I walked through the tutorial in the docs. I executed the tests as described using the Canary resource defined below:

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: podinfo
  namespace: test
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  progressDeadlineSeconds: 60
  autoscalerRef:
    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    name: podinfo
  service:
    port: 9898
    targetPort: 9898
    hosts:
      - www.example.com
    gatewayRefs:
      - name: gateway
        namespace: istio-ingress
    primaryBackend:
      backendRef:
        name: hoge
        namespace: kube-system
        port: 10250
    canaryBackend:
      filters:
        - type: URLRewrite
          urlRewrite:
            hostname: www.example.com
  analysis:
    interval: 1m
    threshold: 5
    maxWeight: 50
    stepWeight: 10
    metrics:
    - name: error-rate
      templateRef:
        name: error-rate
        namespace: flagger-system
      thresholdRange:
        max: 1
      interval: 1m
    - name: latency
      templateRef:
        name: latency
        namespace: flagger-system
      thresholdRange:
        max: 0.5
      interval: 30s
    webhooks:
      - name: smoke-test
        type: pre-rollout
        url: http://flagger-loadtester.test/
        timeout: 15s
        metadata:
          type: bash
          cmd: "curl -sd 'anon' http://podinfo-canary.test:9898/token | grep token"
      - name: load-test
        url: http://flagger-loadtester.test/
        timeout: 5s
        metadata:
          cmd: "hey -z 2m -q 10 -c 2 -host www.example.com http://gateway-istio.istio-ingress/"

Based on this request, I confirmed that the following resources were created as expected:

  1. ReferenceGrant

    • The ReferenceGrant resource was created in the kube-system namespace with the proper hash annotation and owner label, granting the gateway permission to reference the primary service (hoge).
  2. HTTPRoute

    • The HTTPRoute resource in the test namespace was created with the correct configuration:
      • It has the expected hostname (www.example.com).
      • The parentRefs correctly points to the gateway in the istio-ingress namespace.
      • The rules include both backend references—one for the primary service and another (with a URL rewrite filter) for the canary—as well as the default match on the path /.

Moreover, tests have been written to validate this behavior. With these outcomes, the resources appear to adhere to the expected request-response behavior and the corresponding HTTPRoute definition matches.

kahirokunn avatar Feb 13 '25 09:02 kahirokunn

Hello 😢

kahirokunn avatar Mar 25 '25 00:03 kahirokunn

@aryan9600 Hi 👋

kahirokunn avatar Jun 10 '25 04:06 kahirokunn

Bump up

kahirokunn avatar Jun 30 '25 00:06 kahirokunn

@aryan9600 Hello 😢

kahirokunn avatar Sep 04 '25 23:09 kahirokunn

@aryan9600 CC: @stefanprodan Thank you for the review and great feedback! 🙏 I've added the documentation as requested. The logic and docs should now be complete. Looking forward to getting this merged! 🎉

kahirokunn avatar Oct 13 '25 01:10 kahirokunn

@aryan9600 CC: @stefanprodan The conflict is resolved! Functionality has been verified. Can you merge before conflicts occur again? 🙏

kahirokunn avatar Oct 16 '25 01:10 kahirokunn