gateway icon indicating copy to clipboard operation
gateway copied to clipboard

SecurityPolicy crd not working correctly with multiple gateway controllers

Open zetaab opened this issue 1 year ago • 8 comments

Description: I have currently two different gateways installed (called internal and external). I am trying to create oidc and jwt SecurityPolicy to internal one. However, some configurations gets applied to external one and the auth itself does not work at all. If I try to create these policies to external one, it will somehow work. After I will apply cognito configurations to internal gateway I can see cognito configurations to be applied to external as well.

Repro steps: install two different gateway controllers and then https://gist.github.com/zetaab/e70547adb70a8de61765387f36e8c23f

I have currently that configuration applied and I can see following in external envoyproxy (which should be only under internal gateway components): Screenshot 2024-01-28 at 23 07 08

it does not have any configuration that should be against cognito at all.

Environment: kube 1.29.1

Logs:

zetaab avatar Jan 28 '24 21:01 zetaab

the problem might be that both controllers are listening SecurityPolicy and its changes. However, it should filter according targetRef "is this my task" or not. If I shutdown that another controller, things starts to work better.

zetaab avatar Jan 28 '24 21:01 zetaab

thanks for raising this issue, as of today there is no parent for any of the Policies, so every controller can reconcile it, which is the issue you are facing Recommendation ( as well as a workaround ) is to limit each controller to reconcile a limited number of namespaces, https://gateway.envoyproxy.io/latest/user/deployment-mode/

The better solution is to add an optional parent for a Policy, thinking out loud, it should be the controller name (versus GatewayClass name since some policies can attach to a Gatewayclass) similar to what exists in GatewayClass https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io%2fv1.GatewayController cc @envoyproxy/gateway-maintainers

arkodg avatar Jan 29 '24 17:01 arkodg

@arkodg could you help me little bit. I was investigating this and change https://github.com/zetaab/gateway/commit/f83a3e9efdf2647f280ed52711a43a08d7fd0005 makes it already work like should. However, there might be race condition if the controllers will reconcile the object in "wrong order".

Example:

2 controllers: lets say called internal and external. We have securitypolicy that should be applied only to external controller.

If the reconciling order is 1) external 2) internal following will happen:

  1. external will apply configuration fine and update securitypolicy status Accepted
  2. internal tries to apply configuration. However, it does not have route object so it will not add the configuration but it will update the securitypolicy status to TargetNotFound.

Anyways it works already better than before. Is it worth of making PR of this, or should I add that new field to securitypolicy and make it possible to configure gatewayClass for securitypolicy object?

zetaab avatar Feb 07 '24 21:02 zetaab

@zetaab we can't skip appending the policy into the result, since we need to have to update the status for all policies

the solution here is some sort of partitioning, since we cannot control reconcile order (during restarts etc)

  1. ether partition by limiting namespaces being reconciled (supported today)
  2. or add a parent field to limit reconciliation to a controller or a gatewayclass (controller child)

Id recommend trying out 1 for now (which is also better for performance, lesser amount of reconciliation), until we have some consensus on 2

arkodg avatar Feb 07 '24 22:02 arkodg

looks like there is a better solution upstream https://gateway-api.sigs.k8s.io/geps/gep-713/#standard-status-struct we'll need to replace the current status object https://github.com/envoyproxy/gateway/blob/a33c50582f7ff2ce19a64e8774905fe66b5a02bd/api/v1alpha1/securitypolicy_types.go#L34 with the upstream type https://github.com/kubernetes-sigs/gateway-api/blob/2ac95e4577ab9a0cedf9cc302f279f36aeb4d6db/apis/v1alpha2/policy_types.go#L187

this status is unique per controller (per ancestor which has a controller field) so reconciliation by multiple controllers should be able to update their individual ancestor status without affecting each other

arkodg avatar Feb 08 '24 02:02 arkodg

but that also says:

// Note also that implementations MUST ONLY populate ancestor status for
// the Ancestor resources they are responsible for.

so basically the another controller is not responsible of that policy at all. So it should not be added as status

zetaab avatar Feb 08 '24 06:02 zetaab

so basically the another controller is not responsible of that policy at all. So it should not be added as status

we cannot determine why a controller reconciling a policy targeting a gateway, cannot find the gateway

  • the user maybe have introduced a typo
  • the gateway may not exist in the cluster
  • the gateway may exist in the cluster but may not belong to the controller (part of another gatewayclass linked to another controller)

arkodg avatar Feb 09 '24 00:02 arkodg

facing the same issue with BTP as in e2e test https://github.com/envoyproxy/gateway/actions/runs/8073622448/job/22057769156?pr=2665#step:6:2130

#2665 introduce merge gateways e2e test, that requires multiple-gatewayclass per controller feature. but this test will affects all the other test cases that have policies attached. because the new gatewayclass in controller will also update the policy status, even the policy is not belong to new GC's.

so by fixing #2631, I think this issue can be resolved. assigning myself to these two issues.

shawnh2 avatar Feb 28 '24 08:02 shawnh2

should be fixed by #2802

arkodg avatar Mar 08 '24 02:03 arkodg