gloo
gloo copied to clipboard
Gloo validating webhook blocks deletion of unrelated secrets
Gloo Edge Product
Open Source
Gloo Edge Version
v1.15.16
Kubernetes Version
v1.27.3
Describe the bug
When an upstream with a bad state exists in cluster no secret can be deleted - even secrets that are not referenced by Gloo Edge resources.
Expected Behavior
We need to be able to delete secrets that are not referenced by Gloo Edge resources.
Steps to reproduce the bug
- Create a kind cluster:
kind create cluster --image kindest/node:v1.27.3
- Install Gloo Edge chart with below values:
cat <<EOF | helm upgrade --install gloo gloo/gloo --version 1.15.16 -n gloo-system --create-namespace --wait --values -
gatewayProxies:
gatewayProxy:
service:
type: NodePort
httpPort: 31500
httpsPort: 32500
httpNodePort: 31500
httpsNodePort: 32500
gateway:
validation:
allowWarnings: true
alwaysAcceptResources: false
failurePolicy: Fail
EOF
- Create an upstream with non-exisitng service:
cat <<EOF | kubectl create -f -
apiVersion: gloo.solo.io/v1
kind: Upstream
metadata:
name: my-us
namespace: gloo-system
spec:
kube:
serviceName: my-svc
serviceNamespace: gloo-system
servicePort: 18081
serviceSpec:
grpc: {}
EOF
- Create a virtualService that refers the previously created upstream:
cat <<EOF | kubectl create -f -
apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
name: my-vs
spec:
virtualHost:
domains:
- valid.local
options:
routes:
- matchers:
- prefix: /
routeAction:
single:
upstream:
name: my-us
namespace: gloo-system
EOF
Upstream status should have a warning:
status:
statuses:
gloo-system:
reason: "warning: \n 1 error occurred:\n\t* Upstream name:\"my-us\" namespace:\"gloo-system\"
references the service \"my-svc\" which does not exist in namespace \"gloo-system\"\n\n"
reportedBy: gloo
state: Warning
- Edit the default settings resource(gloo-system/default) and change
spec.gateway.validation.allowWarnings
tofalse
. - Create some secret:
kubectl create secret generic db-user-pass \
--from-literal=username=admin \
--from-literal=password='S!B\*d$zDsb='
- Try to delete previously created secret, deletion will fail:
kubectl delete secret db-user-pass
Error from server: admission webhook "gloo.gloo-system.svc" denied the request: resource incompatible with current Gloo snapshot: [failed validating the deletion of resource namespace: default name: db-user-pass: validating *v1.Secret name:"db-user-pass" namespace:"default": Route Warning: InvalidDestinationWarning. Reason: *v1.Upstream { default.my-us } not found]
- Edit the default settings resource(gloo-system/default) and change
spec.gateway.validation.allowWarnings
back totrue
. - Try to delete previously created secret, deletion will now succeed:
kubectl delete secret db-user-pass
secret "db-user-pass" deleted
Additional Environment Detail
No response
Additional Context
No response
Per https://docs.solo.io/gloo-edge/latest/reference/helm_chart_values/open_source_helm_chart_values/, you can find an API that we expose which may be relevant: gateway.validation.webhook.skipDeleteValidationResources[]
. From the docs, this field:
resource types in this list will not use webhook valdaition for DELETEs. Use ‘’ to skip validation for all resources. Valid values are ‘virtualservices’, ‘routetables’,‘upstreams’, ‘secrets’, ‘ratelimitconfigs’, and ‘’. Invalid values will be accepted but will not be used.
I admit that this is an imperfect solution, but I wanted to share it to see if it is a temporary solution.
I think this is issue is adjacent to https://github.com/solo-io/solo-projects/issues/5343. They are both centered around some of the challenges of cleaning up resources, especially when strict validation is enabled.
@shahar-h is the temporary solution that I linked above (to ignore secret deletion) and first step you can take?
Thanks, @sam-heilbron we'll wait for the permanent solution as we're anyway using allowWarnings: true
for other purposes. Did you start working on a solution? If yes, is there an ETA?
I am un-assigning myself as I work on other priorities. I have created an internal Slab document describing the issue, possible solutions, and gathered thoughts from the team.
Approach we will take:
- When 'allow_warnings' is false, there are warnings, and we deleting a secret, we will take the following steps:
- When validating, we will collect all warnings/errors instead of stopping at the first error
- If errors are found, we will re-run validation on the original snapshot without the delete
- We will compare the errors, proxyReports, and hashes of the proxies to see if there is any change to the output
- If no change is observed in the snapshot status due to the deletion of the secret, we will conclude the secret is not being used by gloo and will delete the secret
- Due to the extra processing, we will allow this functionality to be turned off by an environment variable
Some general guidelines we synced on that I want to reiterate here
- Some things are Errors somethings are just warnings
- Our new functionality SHOULD only be more permissive
- Our new functionality MUST keep allow warnings to be purely more permissive than not allowing warnings
Remaining work:
- additional unit tests - 2d
- PR feedback - 2d
- kube e2e test - 3d
- backports - 2d
See ETA for how this fits on the calendar
@DuncanDoyle We have a test release ready: ~~https://github.com/solo-io/solo-projects/actions/runs/7823727984~~ https://github.com/solo-io/solo-projects/actions/runs/7936706071 is newer It would be great to see this run with some larger/real-world configurations.
Still doing backports/solo-projects updates
OSS releases:
- v1.17.0 - beta9 https://github.com/solo-io/gloo/releases/tag/v1.17.0-beta9
- v1.16.5 - https://github.com/solo-io/gloo/releases/tag/v1.16.5
Pulled in to GlooEE for release with
- v1.17.0-beta1
- v.1.16.4
We need to have DISABLE_VALIDATION_AGAINST_PREVIOUS_STATE set in older lts