ingress-nginx icon indicating copy to clipboard operation
ingress-nginx copied to clipboard

support 2 tier control over sticky_persistent sessions

Open travisghansen opened this issue 1 year ago • 5 comments

What this PR does / why we need it:

Better control over the balancing logic of sticky_persistent when clients do not have a valid cookie value (or don't support cookies for whatever reason) by replacing the currently random selection. I've implemented the ability to control the 2nd tier of balancing by using the load-balance annotation to select any of the existing balancers (excluding the sticky_* balancers to prevent infinite loops).

If load-balance annotation is not present in an otherwise sticky_persistent configuration the new features are entirely ignored and will simply fallback to the existing return self.instance:random_except(failed_upstreams).

Our particular use-case is to replace an existing F5 config using k8s+ingress-nginx which selects the 2nd tier by client IP. This PR solves that use-case generically by effectively allowing any of the existing balancers to be leveraged as the 2nd tier.

Some examples:

# using consistent hashing based on client IP
nginx.ingress.kubernetes.io/affinity: cookie
nginx.ingress.kubernetes.io/affinity-mode: persistent
nginx.ingress.kubernetes.io/load-balance: chash
nginx.ingress.kubernetes.io/upstream-hash-by: $remote_addr

# ewma
nginx.ingress.kubernetes.io/affinity: cookie
nginx.ingress.kubernetes.io/affinity-mode: persistent
nginx.ingress.kubernetes.io/load-balance: ewma

# round_robin
nginx.ingress.kubernetes.io/affinity: cookie
nginx.ingress.kubernetes.io/affinity-mode: persistent
nginx.ingress.kubernetes.io/load-balance: round_robin

Types of changes

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [X] New feature (non-breaking change which adds functionality)
  • [ ] CVE Report (Scanner found CVE and adding report)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] Documentation only

Which issue/s this PR fixes

How Has This Been Tested?

Tested with an on-prem cluster using metallb with externalTrafficPolicy set to Local to ensure consistent client IP ($remote_addr). I have removed various debugging statements but had them spread everywhere to test the various logic flows. Beyond testing the pin-by-IP logic I also tested round_robin and ewma as secondary balancers as well. On the surface all seems sane at this point.

Checklist:

  • [ ] My change requires a change to the documentation.
  • [ ] I have updated the documentation accordingly.
  • [ ] I've read the CONTRIBUTION guide
  • [ ] I have added unit and/or e2e tests to cover my changes.
  • [ ] All new and existing tests passed.

travisghansen avatar Nov 07 '23 14:11 travisghansen