ingress-nginx
                                
                                 ingress-nginx copied to clipboard
                                
                                    ingress-nginx copied to clipboard
                            
                            
                            
                        Controller: Correctly identify other pods on shutdown.
What this PR does / why we need it:
- #11087 is a serious footgun: if a user upgrades the helm chart version in-place, or adds pod labels (either via the helm chart or some other mechanism), ingress-nginx may shutdown an ingress even though there are other ingress-nginx pods running and serving traffic.
- The root cause of #11087 is that ingress-nginx re-uses the set of pod labels on the terminating pod to find other pods belonging to the same controller group. This is a bug, since there are cases where we expect the set of labels on the terminating pod to differ from the labels on other pods belonging to the same controller group (for example, during an in-place helm upgrade, the chart version label will be different on old vs. new pods).
- ingress-nginx already has an appropriate primitive to track the pods belonging to the same controller group: electionID. We expect a pod to tear down an ingress iff all pods with the same electionID have gone away (i.e. meaning there will be no future successful master election).
Types of changes
- [X] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] CVE Report (Scanner found CVE and adding report)
- [X] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Documentation only
This is a breaking change in two respects:
- I would argue all helm chart upgrades are currently breaking changes because of #11087. This is dangerous for users until resolved.
- The PR enables this new behavior by default. I don't believe this is a breaking change for users, but would like other eyes to validate my assumption.
Which issue/s this PR fixes
- fixes #1877[^1]
- fixes #4774[^1]
- fixes #7047[^1]
- fixes #11087
How Has This Been Tested?
- I have added unit tests to the helm chartto validate the new label is defined.
- I have added unit tests to the status.gocomponent to validate the pod finding behavior.
Checklist:
- [X] My change requires a change to the documentation.
- [X] I have updated the documentation accordingly.
- [X] I've read the CONTRIBUTION guide
- [X] I have added unit and/or e2e tests to cover my changes.
- [X] All new and existing tests passed.
[^1]: See: https://github.com/kubernetes/ingress-nginx/issues/11087#issuecomment-2712118775