gloo icon indicating copy to clipboard operation
gloo copied to clipboard

Gloo validating webhook blocks deletion of unrelated secrets

Open shahar-h opened this issue 1 year ago • 7 comments

Gloo Edge Product

Open Source

Gloo Edge Version

v1.15.16

Kubernetes Version

v1.27.3

Describe the bug

When an upstream with a bad state exists in cluster no secret can be deleted - even secrets that are not referenced by Gloo Edge resources.

Expected Behavior

We need to be able to delete secrets that are not referenced by Gloo Edge resources.

Steps to reproduce the bug

  1. Create a kind cluster:
kind create cluster --image kindest/node:v1.27.3
  1. Install Gloo Edge chart with below values:
cat <<EOF | helm upgrade --install gloo gloo/gloo --version 1.15.16 -n gloo-system --create-namespace --wait --values -
gatewayProxies:
  gatewayProxy:
    service:
      type: NodePort
      httpPort: 31500
      httpsPort: 32500
      httpNodePort: 31500
      httpsNodePort: 32500
gateway:
  validation:
    allowWarnings: true
    alwaysAcceptResources: false
    failurePolicy: Fail
EOF
  1. Create an upstream with non-exisitng service:
cat <<EOF | kubectl create -f -
apiVersion: gloo.solo.io/v1
kind: Upstream
metadata:
  name: my-us
  namespace: gloo-system
spec:
  kube:
    serviceName: my-svc
    serviceNamespace: gloo-system
    servicePort: 18081
    serviceSpec:
      grpc: {}
EOF
  1. Create a virtualService that refers the previously created upstream:
cat <<EOF | kubectl create -f -
apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
  name: my-vs
spec:
  virtualHost:
    domains:
    - valid.local
    options:
    routes:
    - matchers:
        - prefix: /
      routeAction:
        single:
          upstream:
            name: my-us
            namespace: gloo-system
EOF

Upstream status should have a warning:

status:
  statuses:
    gloo-system:
      reason: "warning: \n  1 error occurred:\n\t* Upstream name:\"my-us\" namespace:\"gloo-system\"
        references the service \"my-svc\" which does not exist in namespace \"gloo-system\"\n\n"
      reportedBy: gloo
      state: Warning
  1. Edit the default settings resource(gloo-system/default) and change spec.gateway.validation.allowWarnings to false.
  2. Create some secret:
kubectl create secret generic db-user-pass \
  --from-literal=username=admin \
  --from-literal=password='S!B\*d$zDsb='
  1. Try to delete previously created secret, deletion will fail:
kubectl delete secret db-user-pass
Error from server: admission webhook "gloo.gloo-system.svc" denied the request: resource incompatible with current Gloo snapshot: [failed validating the deletion of resource namespace: default name: db-user-pass: validating *v1.Secret name:"db-user-pass" namespace:"default": Route Warning: InvalidDestinationWarning. Reason: *v1.Upstream { default.my-us } not found]
  1. Edit the default settings resource(gloo-system/default) and change spec.gateway.validation.allowWarnings back to true.
  2. Try to delete previously created secret, deletion will now succeed:
kubectl delete secret db-user-pass
secret "db-user-pass" deleted

Additional Environment Detail

No response

Additional Context

No response

shahar-h avatar Nov 26 '23 21:11 shahar-h

Per https://docs.solo.io/gloo-edge/latest/reference/helm_chart_values/open_source_helm_chart_values/, you can find an API that we expose which may be relevant: gateway.validation.webhook.skipDeleteValidationResources[]. From the docs, this field:

resource types in this list will not use webhook valdaition for DELETEs. Use ‘’ to skip validation for all resources. Valid values are ‘virtualservices’, ‘routetables’,‘upstreams’, ‘secrets’, ‘ratelimitconfigs’, and ‘’. Invalid values will be accepted but will not be used.

I admit that this is an imperfect solution, but I wanted to share it to see if it is a temporary solution.

I think this is issue is adjacent to https://github.com/solo-io/solo-projects/issues/5343. They are both centered around some of the challenges of cleaning up resources, especially when strict validation is enabled.

@shahar-h is the temporary solution that I linked above (to ignore secret deletion) and first step you can take?

sam-heilbron avatar Dec 18 '23 20:12 sam-heilbron

Thanks, @sam-heilbron we'll wait for the permanent solution as we're anyway using allowWarnings: true for other purposes. Did you start working on a solution? If yes, is there an ETA?

shahar-h avatar Dec 19 '23 08:12 shahar-h

I am un-assigning myself as I work on other priorities. I have created an internal Slab document describing the issue, possible solutions, and gathered thoughts from the team.

inFocus7 avatar Jan 11 '24 14:01 inFocus7

Approach we will take:

  • When 'allow_warnings' is false, there are warnings, and we deleting a secret, we will take the following steps:
    1. When validating, we will collect all warnings/errors instead of stopping at the first error
    2. If errors are found, we will re-run validation on the original snapshot without the delete
    3. We will compare the errors, proxyReports, and hashes of the proxies to see if there is any change to the output
    4. If no change is observed in the snapshot status due to the deletion of the secret, we will conclude the secret is not being used by gloo and will delete the secret
  • Due to the extra processing, we will allow this functionality to be turned off by an environment variable

sheidkamp avatar Jan 29 '24 17:01 sheidkamp

Some general guidelines we synced on that I want to reiterate here

  • Some things are Errors somethings are just warnings
  • Our new functionality SHOULD only be more permissive
  • Our new functionality MUST keep allow warnings to be purely more permissive than not allowing warnings

nfuden avatar Jan 31 '24 16:01 nfuden

Remaining work:

  • additional unit tests - 2d
  • PR feedback - 2d
  • kube e2e test - 3d
  • backports - 2d

See ETA for how this fits on the calendar

sheidkamp avatar Feb 07 '24 03:02 sheidkamp

@DuncanDoyle We have a test release ready: ~~https://github.com/solo-io/solo-projects/actions/runs/7823727984~~ https://github.com/solo-io/solo-projects/actions/runs/7936706071 is newer It would be great to see this run with some larger/real-world configurations.

sheidkamp avatar Feb 08 '24 03:02 sheidkamp

Still doing backports/solo-projects updates

sheidkamp avatar Mar 01 '24 15:03 sheidkamp

OSS releases:

  • v1.17.0 - beta9 https://github.com/solo-io/gloo/releases/tag/v1.17.0-beta9
  • v1.16.5 - https://github.com/solo-io/gloo/releases/tag/v1.16.5

Pulled in to GlooEE for release with

  • v1.17.0-beta1
  • v.1.16.4

sheidkamp avatar Mar 04 '24 20:03 sheidkamp

We need to have DISABLE_VALIDATION_AGAINST_PREVIOUS_STATE set in older lts

nfuden avatar May 01 '24 14:05 nfuden