gateway Support ExtensionContextPolicy resources to provide state to Extension Server hooks

Description: The Extension Server protocol supports four hooks today. Only one of these hooks supports receiving context as part of its invocation.

While it's possible to register a CRD and attach specific instances of it to an HTTPRoute during calls to the route modification hook, it's not possible to get similar context with the other hooks. In the example extension server (#2931) this is worked around by developing a full operator for context, and then when the HTTPListener modification hook is called then correlating the internal extension server state with the hook details.

I propose adding a policy document that would allow attaching context to calls to the extension hook server. For example:

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: ExtensionContextPolicy
metadata:
   name:  my-context
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: Gateway
    name: eg
    namespace: default
  # Support sending multiple documents to the extension server as contextual information
  extensionRef:
  - group: example.myextension.io
     kind: LuaScript
     name: my-lua-script
---
apiVersion: example.myextension.io/v1
kind: LuaScript
metadata:
   name: my-lua-script
spec:
   lua: # some lua script

The ExtensionContextPolicy can be attached to:

Gateway resources (entire resource or just a section), to be sent to the Listener modification hook
HTTPRoute resources, to be sent to the Route modification hook

By passing the contextual resource as a parameter, extension server development is made easier since the extension server doesn't need to implement an operator itself for the common use-cases.

Mar 18 '24 15:03 liorokman

this was discussed in the meeting today

there's currently nothing in the spec defined in the ExtensionManager config https://github.com/envoyproxy/gateway/blob/cc01bf5ea9e962f82314d85e84b7a00b8314a34c/api/v1alpha1/envoygateway_types.go#L398 that talks about the context sent per hook for a resource type, that is an implementation detail.
more work is required to build a better API contract for this feature and improve the API
most of the attendees on the call were a -1 on adding a new Poilicy, instead suggest updating the spec within the EnvoyGateway config that already exists cc @AliceProxy @guydc

Mar 28 '24 18:03 arkodg

more work is required to build a better API contract for this feature and improve the API

Agreed. My intent was to update the PostHTTPListenerExtensionContext so that it would be possible to pass extension resources - similar to the PostRouteExtensionContext resource.

most of the attendees on the call were a -1 on adding a new Poilicy, instead suggest updating the spec within the EnvoyGateway config that already exists

Unless I'm misunderstanding the intent behind the EnvoyGateway resource, there's 1 such resource per Envoy Gateway installation.

My intent was to make it possible to configure the extension server to receive a different instance of an extension type per configured listener.

How about attaching this information to the ClientTrafficPolicy resource, which is attachable to Gateway resources, and therefore applicable to Listeners?

Apr 01 '24 06:04 liorokman

-1 to adding any extension manager related logic to ClientTrafficPolicy which is tied to the platform admin / app admin persona where's the ExtensionManager is tied to a vendor / extension developer delivering a modified version of Envoy Gateway

Apr 01 '24 07:04 arkodg

If not the ClientTrafficPolicy, then I think a new policy type is needed here to support the vendor/extension developer persona.

The EnvoyGateway resource seems wrong because there might be a different extension resource required for each listener.

Example:

There are two Gateways defined, and the PostHTTPListenerHook called for listeners on each Gateway should be sent a different extension resource:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: first-gateway
  namepace: first
spec:
  gatewayClassName: eg
  listeners:
    - name: http
      protocol: HTTP
      port: 80
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: second-gateway
  namepace: second
spec:
  gatewayClassName: eg
  listeners:
    - name: http
      protocol: HTTP
      port: 80
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: ExtensionContextPolicy
metadata:
   name:  first-context
   namespace: first
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: Gateway
    name: first-gateway
    namespace: first
  extensionRef:
  - group: example.myextension.io
     kind: LuaScript
     name: my-first-lua-script
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: ExtensionContextPolicy
metadata:
   name:  second-context
   namespace: second
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: Gateway
    name: second-gateway
    namespace: second
  extensionRef:
  - group: example.myextension.io
     kind: LuaScript
     name: my-second-lua-script

Apr 01 '24 07:04 liorokman

@liorokman I suggest consider using EnvoyPatchPolicy . The Policy can be created programmatically. The only difference with ExtensionManager

Your custom controller will need to watch for Gateway API resources (controller-runtime has made it fairly easy to perform this task)
In extension manager, xds is patches and sent over the network, in EnvoyPatchPolicy its patched after persisting in K8s API server (which means the patch is visible to someone with admin/right role access)

Apr 01 '24 07:04 arkodg

The biggest difference between using the ExtensionManager and maintaining sets of EnvoyPatchPolicy documents is that JSON patches are hard to get right, and when they stop applying cleanly due to some change in the way that Envoy Gateway generates XDS, it's not always trivial to understand what happened.

In contrast, ExtensionManager uses strongly typed Golang structs and if something goes wrong, it's easier to recover in case something changed in Envoy Gateway's XDS translation.

If I'm going to develop some software anyways (either to generate EnvoyPatchPolicy instances or to implement the ExtensionServer interface), I prefer to implement the ExtensionServer interface and perform the required change in the XDS directly in my own code. Watching just the Gateway API resources would not be enough, software generating EnvoyPatchPolicy documents would also need to be intimately aware of how the various settings in policies like ClientTrafficPolicy affect the created XDS.

If I implement an operator that generates EnvoyPatchPolicy resources and it needs different contextual information per listener, then the same issue I'm trying to solve here still remains; if the generated JSON patch needs to be affected by some other resource in the API server and it's a different resource per listener, then it would make more sense for Envoy Gateway to do the work against the Envoy Gateway Provider. The mechanism I propose is future proof against the day that Envoy Gateway implements its next configuration Provider - e.g. a File System provider.

Apr 01 '24 08:04 liorokman

If I implement an operator that generates EnvoyPatchPolicy resources and it needs different contextual information per listener, then the same issue I'm trying to solve here still remains; if the generated JSON patch needs to be affected by some other resource in the API server and it's a different resource per listener, then it would make more sense for Envoy Gateway to do the work against the Envoy Gateway Provider. The mechanism I propose is future proof against the day that Envoy Gateway implements its next configuration Provider - e.g. a File System provider.

This is exactly the point, the reconciling and relationship b/w Gateway API resources and custom non EG resources are very unique per use case. You can still provide this context at the EnvoyGateway level and also adding any generic matching API there, and that can be passed directly to the ExtensionManager runner What I'm actually trying to ensure is to make sure the code maintenance/complexity and performance of the Gateway API resources is not affected by ExtensionManager. Even right now, we dont seem to have a code owner for ExtensionManager and the API contract is not well defined, the situation will only get worse as more code is added

Apr 01 '24 09:04 arkodg

What I would like to have is the ability to attach some resource(s) in Kubernetes that to a listener, such that when the extension server's PostHTTPListenerHook is called for that listener the attached resource(s) are sent to the extension server as context.

If the sticking point is to not use the Policy mechanism for this attachment, then how about something along these lines:

Extend the context definition to include an array of resources.
Configure a set of GVTs for context types in EnvoyGateway and give them a name.
When a Gateway resource is being translated to IR, then if that gateway has an annotation on it (e.g. gateway.envoyproxy.io/extension-context ), then based on the value of that annotation the correct instances of the context types would be picked up from the API server and sent to the extension server.
Changes to instances of the context type that are attached to a Gateway should trigger a reconciliation.

For example:

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyGateway
extensionManager:
  resources:
  - name: listenContext  # This is an optional attribute in this struct for backwards compatibility
    group: example.myextension.io
    version: v2
    kind: ListenerContext
  hooks:
    xdsTranslator:
      post:
      - HTTPListener
---
apiVersion: example.myextension.io/v2
kind: ListenerContext
metadata:
   name: some-name
   namespace: my-namespace
spec:
  all: sorts
  of: information
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: eg
  namespace: my-namespace
  annotations: 
         # All listeners defined in this gateway should be sent to the extension server
         # with the set of context resources listed here
         gateway.envoyproxy.io/extension-context: '[{"type":"listenContext", "name":"some-name", "namespace":"my-namespace"}]'
spec:
  gatewayClassName: eg
  listeners:
    - name: http
      protocol: HTTP
      port: 80

Apr 09 '24 11:04 liorokman

Sending specific context for specific gateway api resources - can 3. be solved using a selector / match structure that supports label selectors, which looks something like this ?

hooks:
   xdsTranslator:
      post:
      - HTTPListener
      context:
      - for:
           kind: Gateway
           group: gateway.networking.k8s.io
         send:
           kind: ListenerContext
           group: example.myextension.io
         when:
           selector:
             matchLabels:
               app: foo

this can be solved within the extension manager server

You can use existing envoyGateway.provider.kubernetes.watch to fine tune reconciliation

Apr 09 '24 11:04 arkodg

Sending specific context for specific gateway api resources - can 3. be solved using a selector / match structure that supports label selectors, which looks something like this ?
hooks:
   xdsTranslator:
      post:
      - HTTPListener
      context:
      - for:
           kind: Gateway
           group: gateway.networking.k8s.io
         send:
           kind: ListenerContext
           group: example.myextension.io
         when:
           selector:
             matchLabels:
               app: foo
this can be solved within the extension manager server

You can use existing envoyGateway.provider.kubernetes.watch to fine tune reconciliation

This seems a little too much on one hand, and a little to little on the other.

HTTPListener hook types are always for Gateway resources. The for section of the context structure would never change, because anything other than Gateway doesn't make any sense.
It's not clear here what the matchLabels section specifies be. Are they on the ListenerContext items, so that only a few of the tens of thousand instances of ListenerContext resources in the API server are sent to the extension server? Or are they on the specific Gateway instance to which I want to match a specific context.
How would I specify in this configuration, that for a new Gateway resource I am now adding and only for this new gateway, that I want a specific instance of my ListenerContext to be sent to the extension server?

I think that configuring the connection between a Gateway and its associated context items should be done either on the Gateway or on the context items. It's wrong to place this configuration in a third non-related location.

Apr 09 '24 11:04 liorokman

This is gonna be long-winded, so sorry in advance.

I'm a strong -1 on any api that requires information about how the extension operates (like the names of the hooks and whatnot) to be known/understood by the end users of an extension for them to be able to use any custom resources added by the extension. It's totally fine if that is an option, but it should definitely not be a requirement. I also have no issues with creating a new custom resource or calling it xxxPolicy. The main issue I have with policies are specifically "GatewayAPI policies". If something makes sense to implement as a "GatewayAPI policy" then that is totally fine, but I find that more often that not, it does not make any sense to try and force an API to conform to GatewayAPI's idea of what a policy should be rather than just designing the custom resource in whatever way makes most sense for its API.

To nit-pick on a couple examples:

hooks:
   xdsTranslator:
      post:
      - HTTPListener
      context:
      - for:
           kind: Gateway
           group: gateway.networking.k8s.io
         send:
           kind: MyExtensionCustomResource
           group: example.myextension.io
         when:
           selector:
             matchLabels:
               app: foo

and

  resources:
  - name: my-resource
    group: example.myextension.io
    version: v2
    kind: MyExtensionCustomResource
  hooks:
    xdsTranslator:
      post:
      - HTTPListener

The person creating the MyExtensionCustomResource resources needs to be aware of what resources the extension expects and at which hooks it expects to receive them with. If they don't label their resources correctly or add extra config under the above examples to set the name every single time they make a new custom resource, then it won't work. That model only really makes sense when the person/group running the extension is the same as the person/group that wrote it. I especially dislike the idea of using label selectors here as a solution.

As an extension developer, I should be able to specify the config/context for what resources my extension requires for each hook without the end users of my extension ever having to touch that config or worry about how it works. Basically, whatever API we use to specify context should be something that an extension developer can write once, deploy when users install their extension, and then never have to be touched again. I'm not saying that needs to be the default or only model for setting up context info, but it needs to be at least possible.

For example, if I am writing an extension to add a GlobalLuaFilter resource that should run on all requests, it doesn't make any sense to require my extension's users to setup this context every time they want to create a GlobalLuaFilter or require those users to have an understanding about all this context stuff. A slight tweak to the above models where the name field is either optional, or supports name: * would support that workflow while also supporting the case where the person running the extension knows exactly what context to supply and where so that the extension doesn't have to figure out from a list of all resources of that GVK.

If I want to write my extension, and provide config to say "hey Envoy Gateway, send over all of the MyCustomCRD resources to the route hook whenever it is invoked and then let the extension figure out if they are relevant or not so that the end users don't need to have any knowledge about how the extension works", then that should be possible.

On the other hand, we should absolutely allow for context configuration that doesn't require all the resources of a given type to always be sent over the wire every time a hook is invoked if the user does have in-depth knowledge of the expected context and they want to design their extension in a way that it only ever expects to receive context for resources it WILL use.

I think both of those use-cases are different and also equally valuable. We shouldn't sacrifice one to fulfill the other.

In the past, a lot of the discussions around sending context through these hooks hit stalemates during PR review and a consensus was never reached which is why we abandoned the idea of Envoy Gateway tracking state and instead added controllers to our extension. It was nice that our use-case worked out with that approach, but:

Maintaining controllers for your extension is a lot of extra work
That pattern doesn't support a ton of use-cases

As a side-note, it would also be nice to be able to get the context about the GatewayAPI resource relevant to the hook. So for example, when the route level hook is invoked, sending over the HTTPRoute for that xds route so that the extension could do things like look at the config and see if it references any filters/backendrefs that are names of custom resources introduced by the extension.

To provide my own modified version of @liorokman's suggestion

---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: ExtensionContextPolicy
metadata:
   name:  my-context-policy
   namespace: foo
spec:
  contextBindings:
  - targetRef:
      group: gateway.networking.k8s.io
      kind: Gateway
      name: first-gateway # optional, supports *
      namespace: first # optional, supports *
    resourcesToSend:
    - group: example.myextension.io
       kind: LuaScript
       name: my-first-lua-script # optional, supports *

Not to suggest this as the exact API we use or anything (I'm sure there are also problems with this that someone will point out), but this supports the idea of letting the extension's resources supply their own context and letting the extension get them all and figure it out, or limiting context to specific resource/target combinations.

I like something more like this because we are only associating resources together that the end-user should know about and not requiring them to know anything about the hooks or which hook gets which resources and when.

That way you could use a single ExtensionContextPolicy for all your bindings, or create multiple if you need, and it supports both use-cases of being specific about your context, or being generic. We could still support resources in the EnvoyGateway extension manager config as a way to tell Envoy Gateway, "Hey, my extension is actually going to watch it's own resources and keep track of their state, but if you see a resource of this GVK change, then you should reconfigure and run the hooks again"

I had also initially wanted to extend the extension system to cover the infrastructure manager as well so that an extension could do things like inject sidecars to Envoy Proxy and whatnot. I think that would still be cool as a future consideration, but not sure how context fits in with that idea at the moment.

As a final note, I do think that almost all of this friction would have been eliminated if we turned Envoy Gateway into a composable library of public packages so that extension developers had all the flexibility to go with whatever approach they needed and Envoy Gateway didn't need to worry about the details and semantics of extensions, but there was also a lot of pushback on that initial idea when we were first designing the extension system and it would be a very big lift to rewrite everything to fit that model.

Apr 11 '24 20:04 Alice-Lilith

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: ExtensionContextPolicy
metadata:
   name:  my-context-policy
   namespace: foo
spec:
  contextBindings:
  - targetRef:
      group: gateway.networking.k8s.io
      kind: Gateway
      name: first-gateway # optional, supports *
      namespace: first # optional, supports *
    resourcesToSend:
    - group: example.myextension.io
       kind: LuaScript
       name: my-first-lua-script # optional, supports *

I'd appreciate some clarification on this API as I may have misunderstood its functionality:

This API proposes to send different resources to different Gateway/Listeners. Is this level of flexibility needed? Does this imply the extension developer needs to know what Gateway/Listeners the system has in the runtime? It seems unlikely, or at least, uncommon.

On the other hand, should we send all the custom resources to each hook and allow the extension implementation decide what to use? Are there any specific use cases where this might not be suitable?

Apr 18 '24 21:04 zhaohuabing

This API proposes to send different resources to different Gateway/Listeners. Is this level of flexibility needed?

If one Gateway belongs to one application and another Gateway belongs to a different one, then each application might require different context in the extension server.

Does this imply the extension developer needs to know what Gateway/Listeners the system has in the runtime? It seems unlikely, or at least, uncommon.

If the extension developer provides some service (e.g. attach a LuaScript to a Listener), then the developer persona responsible for a specific gateway can decide to use that service. The extension developer doesn't know in advance which gateways exist, the developer persona can attach his specific context to his specific Gateway resource.

On the other hand, should we send all the custom resources to each hook and allow the extension implementation decide what to use? Are there any specific use cases where this might not be suitable?

I think this is exactly what we want to avoid.

If the extension developer needs to choose which custom resource is relevant for which listener, it means that the extension developer needs to be aware of all the listeners and to somehow decide what to apply.

Since EG allows only one extension server, if there are multiple applications exposed via the EG this becomes a problem.
There might be a large number of instances of the custom resources, and it's better to send exactly those resources that are relevant. Just like in the HTTPRoute mechanism we allow sending a specific custom resource and not all of the custom resources of that type. See https://gateway.envoyproxy.io/contributions/design/extending-envoy-gateway/#extending-gateway-api-and-the-data-plane

Apr 20 '24 06:04 liorokman

@kflynn mentioned this GEP: https://gateway-api.sigs.k8s.io/geps/gep-1867 in a different discussion. @liorokman - do you think that GW parameters can be used here as an alternative (acting like HTTPRoute filters) ?

Apr 22 '24 14:04 guydc

@guydc the same CRD should be able to attach to GatewayClass.spec.parametersRef and Gateway.spec.parametersRef . this is the EnvoyProxy resource today. Adding a context field for extensionManager within EnvoyProxy may be an alternate solution

another point I'm hoping we an discuss here, have we ruled out the ability to infer context from a extension CRD using Spec.TargetRef or Spec.TargetRefs field that maps to the upstream definition ?

Apr 22 '24 19:04 arkodg

I’m concerned about the complexity involved in implementing the ExtensionContextPolicy. While the ExtensionContextPolicy is linked to a Gateway API Custom Resource (CR), it is the extension server’s responsibility to apply it to an xDS resource. However, the mapping from CR to xDS is not always a straightforward one-to-one relationship.

In the simplest scenario, we have a one-to-one mapping (CR : xDS = 1 : 1), which is straightforward.
More complex is the one-to-many mapping (CR : xDS = 1 : N), where attaching a policy to an HTTPRoute could impact multiple xDS resources if the HTTPRoute has multiple parentRefs.
Conversely, a many-to-one mapping (CR : xDS = N : 1) occurs when a single xDS resource is affected by multiple policies applied at different levels, such as one policy on an HTTPRoute and another on a Gateway or Listener.

As an extension developer, managing these complexities is daunting. I’m uncertain if the Envoy Gateway is equipped to handle such scenarios effectively.

Envoy Gateway - xDS Hook Design - Resource Attachment excalidraw

Apr 29 '24 10:04 modatwork

To share our experiences for consideration, here are our use cases involving extension hook APIs:

Integrating a customized WAF filter into an HCM
Implementing an original source listener filter to enable IP transparent TCP proxy functionality
Adding a custom TLS-PSK handshaker extension to transport sockets

Sometimes, we apply extension configurations universally across all HCMs/listeners, while in other instances, we tailor them to specific xDS resources, including listeners, filter chains, or routes.

Apr 29 '24 10:04 modatwork

@kflynn mentioned this GEP: https://gateway-api.sigs.k8s.io/geps/gep-1867 in a different discussion. @liorokman - do you think that GW parameters can be used here as an alternative (acting like HTTPRoute filters) ?

Yes, I think that if/when the Gateway resource has a place for implementation specific attributes, then it could be used instead of a policy - in much the same way that context for the extension server is added to HTTPRoute resources.

@guydc the same CRD should be able to attach to GatewayClass.spec.parametersRef and Gateway.spec.parametersRef . this is the EnvoyProxy resource today. Adding a context field for extensionManager within EnvoyProxy may be an alternate solution

The EnvoyProxy resource serves the same purpose as the GatewayClassInfrastructure type proposed in GEP-1867, but is not the same as the GatewayInfrastructure type proposed in GEP-1867. The difference is this:

Provide the ability to configure arbitrary (implementation specific) attributes about a specific Gateway.

Adding context in the EnvoyProxy resource is not a good design for providing configuration for a specific gateway instance.

Apr 30 '24 09:04 liorokman

another point I'm hoping we an discuss here, have we ruled out the ability to infer context from a extension CRD using Spec.TargetRef or Spec.TargetRefs field that maps to the upstream definition ?

The closest thing I know of that is similar to this proposal is Cluster API's provider contracts where the contract mandates the usage of a specific struct with a specific name. For example, see points 5 and 6 here.

The example turns into this:

apiVersion: example.myextension.io/v1
kind: LuaScript
metadata:
   name: my-lua-script
spec:
   bindings:
  -  group: gateway.networking.k8s.io
     kind: Gateway
     name: first-gateway # optional, supports *
   # The rest of the information that is relevant for this context extension
   lua: # some lua script

Pros:

Only the context-relevant CRDs are needed when providing context to the extension server - resulting in fewer instances and fewer CRDs.

Cons:

Extension servers that want to provide an extension context type need to work harder to create CRDs that Envoy Gateway will support.
Since Envoy Gateway provides the types that need to be embedded, there's a tighter coupling here between the Envoy Gateway version and the extension server's version.
This mechanism provides yet another mechanism to extend gateway configuration. We already have:
1. Direct policy attachments GEP-2648
2. ExtensionRef in HTTPRouteFilter and GRPCRouteFilter resources
Not mentioned in Envoy Gateway's extension server design, as opposed to the policy attachment mechanism which is.

Apr 30 '24 09:04 liorokman

I’m concerned about the complexity involved in implementing the ExtensionContextPolicy. While the ExtensionContextPolicy is linked to a Gateway API Custom Resource (CR), it is the extension server’s responsibility to apply it to an xDS resource. However, the mapping from CR to xDS is not always a straightforward one-to-one relationship.

In the simplest scenario, we have a one-to-one mapping (CR : xDS = 1 : 1), which is straightforward.

More complex is the one-to-many mapping (CR : xDS = 1 : N), where attaching a policy to an HTTPRoute could impact multiple xDS resources if the HTTPRoute has multiple parentRefs.

Conversely, a many-to-one mapping (CR : xDS = N : 1) occurs when a single xDS resource is affected by multiple policies applied at different levels, such as one policy on an HTTPRoute and another on a Gateway or Listener.

As an extension developer, managing these complexities is daunting. I’m uncertain if the Envoy Gateway is equipped to handle such scenarios effectively.

I don't think that the ExtensionContextPolicy resource would be the way to link arbitrary resources to arbitrary xDS points. The mappings that I think make sense are only for Listeners. This is actually pretty straightforward to do:

If mergeGateways is not turned on then there's always a 1:1 mapping between a specific listener in a specific Gateway resource and the relevant xDS for that listener.

If mergeGateways is turned on, then there can be a 1:n mapping between an xDS listener and a set of Gateway resources, but it's easy for Envoy Gateway to keep track of this mapping.

Context for routes would still be added as an ExtensionRef on the relevant HTTPRoute or GRPCRoute - as before.

Apr 30 '24 09:04 liorokman

Proposal to add FilterChain level Hook.

WHY?

Giving three Gateway Listeners, and suppose Envoy has supported com.eg.filter_test HTTP filter and handshaker_test custom handshaker.

listeners:
  - name: http
    protocol: HTTP
    port: 80
  - name: https-foo
    protocol: HTTPS
    hostname: foo.eg.com
    port: 443
    tls: ...
  - name: https-bar
    protocol: HTTPS
    hostname: bar.eg.com
    port: 443
    tls: ...

If I have these goals:

Enable com.eg.filter_test HTTP filter on HCM translated by http Listener.
Enable handshaker_test custom handshaker on FilterChain translated by https-foo Listener.
Enable com.eg.filter_test HTTP filter on HCM translated by https-bar Listener.

After all of xDS translations including xDS Hook, xDS resources should archive my goals.

http Listener will be translated to xDS Listener similar like this:

default_filter_chain:
  filters:
    - http_connection_manager:
        http_filters:

          # Add this http filter via xDS Hook API
          - com.eg.filter_test

          - envoy.filters.http.router

https-foo and https-bar Listeners will be translated to xDS Listener similar like this:

filter_chains:
  - filters:
      - http_connection_manager:
          http_filters:
            - envoy.filters.http.router
    transport_socket:
      downstream_tls_context:
        common_tls_context:

          # Add this handshaker via xDS Hook API
          custom_handshaker: handshaker_test

    filter_chain_match:
      server_names:
        - foo.eg.com

  - filters:
      - http_connection_manager:
          http_filters:

            ## Add this http filter via xDS Hook API
            - com.eg.filter_test

            - envoy.filters.http.router
    transport_socket:
      downstream_tls_context:
        common_tls_context
    filter_chain_match:
      server_names:
        - bar.eg.com

In EG, HTTP Listener is translated to default_filter_chain of xDS Listener, HTTPS Listeners are translated to distinct FilterChain in filter_chains of xDS Listener. Each Gateway Listener corresponds to single FilterChain in xDS Listener.

So if I want to goals above, context indicating which FilterChain will be modified in the xDS Listener should be passed to extension server implements HTTPListener Hook.

Another way is to support FilterChain Hook, after FilterChain is translated from Gateway Listener, EG finds Policies attaching to this Gateway Listener and pass it to extension server.

Apr 30 '24 10:04 aoledk

Seems my last comment is not much relevant to this issue about how to pass context to extension server.

To prevent people involved in feeling confused. I will open new issue #3307 to describe my concern about how to locate specific Gateway Listener in xDS Listener.

Apr 30 '24 10:04 aoledk

Building on the discussion above and in the community meeting, the latest proposal becomes this:

Extension server developers that want to receive context on calls to the HTTPListener modification hook can create their own context CRD which must fill the following contract:

The spec MUST contain either a targetRef attribute or a targetRefs attribute of type PolicyTargetReferenceWithSectionName (or array therefor)
The status section of the CRD must be of type PolicyStatus

The CRD type to be used must be added to the EnvoyProxy configuration in the extensionManager section in a new policyResources section:

For example:

#...
   extensionManager:
      policyResources:
      - group:  example.myextension.io
        version: v1
        kind: LuaScript
      hooks:
        xdsTranslator:
          post:
          - HTTPListener
# ...

This is a different section from the existing resources section because Envoy Gateway expects resources listed here to implement the above contract, while resources in the existing resources section are not expected to implement the contract.

Once the policy resource types are registered, Envoy Gateway watches them for changes. If a policy resource targets an existing Gateway, then it is added to the context of the HTTPListener hook when the extension server is called.

For example, consider the following documents:


apiVersion: example.myextension.io/v1
kind: LuaScript
metadata:
   name: my-lua-script
spec:
   targetRefs:
  -  group: gateway.networking.k8s.io
      kind: Gateway
      name: first-gateway
      section: http
   lua: # some lua script

---

apiVersion: example.myextension.io/v1
kind: LuaScript
metadata:
   name: my-second-lua-script
spec:
   targetRefs:
  -  group: gateway.networking.k8s.io
      kind: Gateway
      name: first-gateway
   lua: # some other lua script

---

apiVersion: example.myextension.io/v1
kind: YetAnotherExtensionPolicyType
metadata:
   name: some-name
spec:
   targetRef:
     group: gateway.networking.k8s.io
     kind: Gateway
     name: first-gateway
    
   somePayload: information required for the extension-server

In this example:

The call to the extension-server hook for all of the listeners defined in first-gateway would receive the second LuaScript resource in the call's context object as well as the YetAnotherExtensionPolicyType resource.
The call to the extension-server hook for the listener created for http section in first-gateway would additionally receive the first LuaScript resource in the call's context object, for a total of three resources in the context object.

More constraints:

Extension server policies are directly attached, and can be attached to Gateway resources with or without a SectionName.
Either a single targetRef or an array of targetRefs can be specified in the CRD
Envoy Gateway assumes responsibility for the status section of the Policy object.
If there are multiple extension policy instances, then they are all sent to the extension server's hook. It's the responsibility of the extension server developer to implement merge/override behavior as needed.

May 06 '24 13:05 liorokman

hey @liorokman

+1 on the approach of using targetRef / targetRefs within the extension resource to be able to derive a mapping between Extension Resource and (Gateway API) Gateway resource
-1 on Envoy Gateway managing status of the Extension Resource, it has no way to knowing what good or bad or partial states looks like.

May 07 '24 22:05 arkodg

-1 on Envoy Gateway managing status of the Extension Resource, it has no way to knowing what good or bad or partial states looks like.

While EG can't know what good or bad or partial functional states look like, it can and does know if a specific policy instance was picked up at all and to which Gateway it was attached. At least this information should be updated in the status.

As far as EG is concerned, a policy of this sort is Accepted if the targetRef (or refs) point to something that exists.

May 08 '24 07:05 liorokman

apiVersion: example.myextension.io/v1
kind: LuaScript
metadata:
   name: my-second-lua-script
spec:
   targetRefs:
   -  group: gateway.networking.k8s.io
      kind: Gateway
      name: first-gateway
   lua: # some other lua script

I like the approach @liorokman suggested above, my only open question is what do the targets mean behind the scenes?

Does kind: Gateway mean that Envoy Gateway will send over the context for this resource to all hooks for any resource relevant to that Gateway (listener down to all the routes for that Gateway), or does it correspond to sending context for only a specific hook like kind: Gateway means send this resource for only the Listener hook?

For example, let's say that my resource MyCustomResource is targeting a Gateway but the kind of xDS config it produces is only relevant at the Route/VirtualHost level. For example, maybe I am introducing my own RouteDefaults resource or something with a similar purpose that sets some config on all route objects for a Gateway. Likewise, maybe the resource is only relevant at the Listener level and has nothing to do with Routes.

Right now, we have extension hooks at:

Listener (HTTP)
Virtual Host
Route
Post Modify (chance to modify/inject/remove clusters and secrets at the last moment)

What if we introduce at some point in the future a hook at the Filter Chains level (or really anywhere else)?

Do we need some bit of config somewhere to specify that relationship (like in the extensionManager field), or do we have enough config from the targetRef(s) to do it implicitly?

May 09 '24 17:05 Alice-Lilith

my only open question is what do the targets mean behind the scenes?

For now, I think we should only target the Listener hook, which means exclusively Gateway CRD instances. Since it's trivial, also support the SectionName in the targeting coordinates - these map to specific listeners.

There's already a different way to provide context resources to the extension server for the routes hook.

A future improvement might be to also allow targeting BackendRef resources, and send the resource instances as context to the virtual host hook - but I don't want to handle this in the current implementation.

What if we introduce at some point in the future a hook at the Filter Chains level (or really anywhere else)?

Do we need some bit of config somewhere to specify that relationship (like in the extensionManager field), or do we have enough config from the targetRef(s) to do it implicitly?

As was noted earlier in the thread, it's not trivial to create a policy that targets an Envoy Proxy configuration section. There's no Gateway-API resource that is synonymous with an xDS filter-chain. It would be very difficult to use the Gateway-API policy attachment mechanism to correctly associate CRD instances with an xDS concept.

If EG ever implements a hook for filter chains, then attaching context to it would need to be done with something along the lines of the way that envoy patches work - associating the context document with the xDS configuration name. That's definitely interesting, but I would not want to do this in the context of this issue.

May 09 '24 18:05 liorokman

-1 on Envoy Gateway managing status of the Extension Resource, it has no way to knowing what good or bad or partial states looks like.

While EG can't know what good or bad or partial functional states look like, it can and does know if a specific policy instance was picked up at all and to which Gateway it was attached. At least this information should be updated in the status.

As far as EG is concerned, a policy of this sort is Accepted if the targetRef (or refs) point to something that exists.

we've learnt the hard way in EG of not dealing with target not found cases, and simply ignoring them, more in https://github.com/envoyproxy/gateway/issues/2520

May 09 '24 20:05 arkodg

we've learnt the hard way in EG of not dealing with target not found cases, and simply ignoring them, more in #2520

In this case, then how about EG not touching the policy object's status and instead adding a condition to the object being targeted to specify that it was affected?

As described in GEP-2648.

May 10 '24 04:05 liorokman

we've learnt the hard way in EG of not dealing with target not found cases, and simply ignoring them, more in #2520

In this case, then how about EG not touching the policy object's status and instead adding a condition to the object being targeted to specify that it was affected?

As described in GEP-2648.

if we start supporting selector based targets, then the status write may explode very easily

May 10 '24 19:05 arkodg

gateway gateway copied to clipboard

Support ExtensionContextPolicy resources to provide state to Extension Server hooks

gateway
gateway copied to clipboard