camel-k icon indicating copy to clipboard operation
camel-k copied to clipboard

Sourceless failure with operator applied ksvc

Open hernanDatgDev opened this issue 1 year ago • 11 comments

What happened?

Related Zulip chat.

📓 Please keep in mind the integrations tested with are "sourceless" i.e. we generate and provide our own integration image.

I recently extended the mount trait which allows for emptyDir volumes to be mounted in the integration container. On its own this works fine, but fails when used with knative:

error executing post actions - 1/1 failed: [error during apply resource: camel-k/template-health-check: admission webhook \"validation.webhook.serving.knative.dev\" denied the request: validation failed: expected exactly one, got neither: spec.template.spec.volumes[0].configMap, spec.template.spec.volumes[0].emptyDir, etc...

When the ksvc (knative-service) is applied by the operator, knative doesn't recognize the emptyDir volume that came from the mount trait. It's as if the volume type is undefined and you can see knative requesting one of the valid volume types: configMap, emptyDir, etc... We've verified that the ksvc generated by the operator includes the desired emptyDir. The error occurs when the ksvc resource is applied by the operator in pkg/trait/deployer.go/serverSideApply() in the following code snippet:

    err = env.Client.Patch(env.Ctx, target, ctrl.Apply, ctrl.ForceOwnership, ctrl.FieldOwner("camel-k-operator"))
    if err != nil {
        return fmt.Errorf("error during apply resource: %s/%s: %w", resource.GetNamespace(), resource.GetName(), err)
    }

The reason we believe this is not a knative issue is because when we apply the integration without the knative trait, apply a copy of the same ksvc the operator generated, everything works. The pod is deployed with the emptyDir volume and there is no error from knative.

Steps to reproduce

  1. Ensure knative is installed properly on your cluster
  2. Deploy a sourceless integration:
  • configure knative-service trait enabled=true & auto=false
  • configure mount trait with an emptyDir volume
  1. The integration fails.

Relevant log output

No response

Camel K version

2.4.0 (pre-release)

hernanDatgDev avatar Aug 07 '24 15:08 hernanDatgDev

I had a look at this and I am able to reproduce the error. However, just for the sake of understanding. The patching piece of code you've posted is not really responsible of the problem. That part is in charge to just apply the Kubernetes resource to the cluster. What's really happening is that the cluster is rejecting the resource during Knative API validation for reasons I'm not yet able to understand. I'll keep you posted.

squakez avatar Aug 08 '24 14:08 squakez

Agreed that the operator is simply applying the ksvc. My suspicion is that the operator must be doing/applying something else that happens to conflict with knative. For more context, when we applied the ksvc without using the knative trait, all we had to ensure was that the integration resource was available on the cluster.

hernanDatgDev avatar Aug 08 '24 19:08 hernanDatgDev

@squakez Is there any way you feel I might be able to help in the meantime? Or do you happen to recall the other issues/links where you saw similar behavior?

hernanDatgDev avatar Aug 08 '24 19:08 hernanDatgDev

@hernanDatgDev thanks for the help offered. You can keep troubleshooting trying to apply the KSVC generated by Camel K manually and see if the same error happens.

squakez avatar Aug 09 '24 07:08 squakez

I think the problem is because of this feature that need to be explicitly enabled: https://knative.dev/docs/serving/configuration/feature-flags/#kubernetes-emptydir-volume - however, it's strange it does work when you set the KSVC from CLI.

squakez avatar Aug 09 '24 11:08 squakez

Okey, I think I got to understand what's going on. Camel K is using a server side patch based application of the resources, and this is triggering some webhook on the Knative side. The webhook is not triggered when you're running this via CLI and the same could be prevented setting trait deployer.use-ssa=false.

If you try something like:

kamel run PlatformHttpServer.java --dev -t mount.empty-dirs=name:/container/path -t deployer.use-ssa=false

You should have it working. However it would be good to understand why the webhook is triggered on patch and not on creation.

squakez avatar Aug 09 '24 13:08 squakez

@squakez Thank you for the help. I can confirm the expected behavior with the additional deployer trait on several of our dev clusters. I'm speaking with the knative community to try and figure out where the validation error originates on their side. For now our issues are resolved.

hernanDatgDev avatar Aug 13 '24 17:08 hernanDatgDev

For anyone curious here is the second thread I've started on this topic: https://cloud-native.slack.com/archives/C04LMU0AX60/p1723515184660689 You'll need to join the CNCF slack organization/workspace to get to the knative-serving channel where this thread is.

hernanDatgDev avatar Aug 13 '24 17:08 hernanDatgDev

Cool, thanks @hernanDatgDev . Please, keep us posted.

squakez avatar Aug 14 '24 06:08 squakez

Hey @squakez in order to make a ticket w the knative teams, I need a way to reproduce this error outside of the camel-k operator. I've had a hard time with this and I'm hoping you might be able to help. I've even tried taking all the resources applied by the operator through the deployer trait, and applying manually with kubectl apply --server-side=true but no luck. Everything works as expected until I run SSA from the operator. Any insight would be helpful.

hernanDatgDev avatar Aug 20 '24 20:08 hernanDatgDev

I think what the server side apply patch does is something like --server-side --force-conflicts --field-manager=camel-k-operator.

I think I managed to reproduce the same error also from CLI even without server side apply. Take a KnativeService as it comes from Camel K, and try to include the ownerReferences:

  ownerReferences:
  - apiVersion: camel.apache.org/v1
    blockOwnerDeletion: true
    controller: true
    kind: Integration
    name: platform-http-server
    uid: 54329db5-d01c-436e-960d-fc52026899f9
  # - apiVersion: v1
  #   blockOwnerDeletion: true
  #   controller: true
  #   kind: ConfigMap
  #   name: my-profile
  #   uid: 2d3b37d8-afca-4fcf-b8d1-d716c4119bcb

Removing them or setting a ConfigMap would create the KSVC normally. However, when using an Integration resource as reference, it fails with the following webhook log:

knative-serving webhook-5fdbb849fc-5pj4f webhook {"severity":"ERROR","timestamp":"2024-08-21T13:17:21.99502267Z","logger":"webhook","caller":"validation/validation_admit.go:183","message":"Failed the resource specific validation","commit":"f1bd929","knative.dev/pod":"webhook-5fdbb849fc-5pj4f","knative.dev/kind":"serving.knative.dev/v1, Kind=Service","knative.dev/namespace":"default","knative.dev/name":"platform-http-server","knative.dev/operation":"CREATE","knative.dev/resource":"serving.knative.dev/v1, Resource=services","knative.dev/subresource":"","knative.dev/userinfo":"system:serviceaccount:camel-k:camel-k-operator","stacktrace":"knative.dev/pkg/webhook/resourcesemantics/validation.validate\n\tknative.dev/[email protected]/webhook/resourcesemantics/validation/validation_admit.go:183\nknative.dev/pkg/webhook/resourcesemantics/validation.(*reconciler).Admit\n\tknative.dev/[email protected]/webhook/resourcesemantics/validation/validation_admit.go:79\nknative.dev/pkg/webhook.New.admissionHandler.func4\n\tknative.dev/[email protected]/webhook/admission.go:123\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2136\nnet/http.(*ServeMux).ServeHTTP\n\tnet/http/server.go:2514\nknative.dev/pkg/webhook.(*Webhook).ServeHTTP\n\tknative.dev/[email protected]/webhook/webhook.go:328\nknative.dev/pkg/network/handlers.(*Drainer).ServeHTTP\n\tknative.dev/[email protected]/network/handlers/drain.go:113\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2938\nnet/http.(*conn).serve\n\tnet/http/server.go:2009"}
knative-serving webhook-5fdbb849fc-5pj4f webhook {"severity":"INFO","timestamp":"2024-08-21T13:17:21.995083472Z","logger":"webhook","caller":"webhook/admission.go:151","message":"remote admission controller audit annotations=map[string]string(nil)","commit":"f1bd929","knative.dev/pod":"webhook-5fdbb849fc-5pj4f","knative.dev/kind":"serving.knative.dev/v1, Kind=Service","knative.dev/namespace":"default","knative.dev/name":"platform-http-server","knative.dev/operation":"CREATE","knative.dev/resource":"serving.knative.dev/v1, Resource=services","knative.dev/subresource":"","knative.dev/userinfo":"system:serviceaccount:camel-k:camel-k-operator","admissionreview/uid":"6bccda2d-73a6-461d-9c15-ffefe2d1e798","admissionreview/allowed":false,"admissionreview/result":"&Status{ListMeta:ListMeta{SelfLink:,ResourceVersion:,Continue:,RemainingItemCount:nil,},Status:Failure,Message:validation failed: expected exactly one, got neither: spec.template.spec.volumes[2].configMap, spec.template.spec.volumes[2].emptyDir, spec.template.spec.volumes[2].projected, spec.template.spec.volumes[2].secret,Reason:BadRequest,Details:nil,Code:400,}"}

They are exactly the same traces coming when running an Integration from Camel K. I honestly have no clue why this is happening when using an Integration and only on server side apply from the operator. I have a vague feeling it could be the service account which is running the Integration, because, as you can see in the log trace, it is mentioned. Not sure if or why the object reference SA is used by the webhook, but maybe it's a useful hint for Knative team.

squakez avatar Aug 21 '24 13:08 squakez

I think I finally managed to find a quick workaround to the problem. I think the Knative Webhook is wrongly thinking that spec.template.spec.volumes[2].emptyDir is empty and the check is only applied when there is some server side validation. We're fixing this on our side by adding a sizeLimit: 1Gi which seem anyway a good security measure to avoid overflows. In this way, the webhook is not complaining and the KnativeService is applied as expected.

@hernanDatgDev please have a look and feel free to report this behavior back to the Knative team for fixing.

squakez avatar Sep 05 '24 14:09 squakez