gateway support endpoint picker based on header/dynamic metadata

Description:

Describe the desired behavior, what scenario it enables and how it would be used.

AIBrix:

https://github.com/vllm-project/aibrix/blob/main/config/gateway/gateway.yaml looks like aibrix need to apply envoypatchpolicy in every new Gateway instance, thoughts on replacing it with other approaches?

Route to extension server only when path /v1 + header route-strategy
Route to endpoint based on target-pod header

Gateway-api inference extension also needs to pick endpoint based on header or metadata.

Based on Envoy Override Host Load Balancer Policy https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/load_balancing_policies/override_host/v3/override_host.proto which supports a fallback IP that be used during a retry

May 30 '25 01:05 Xunzhuo

+1 if this can be solved by adding a type in the Backend resource called Original Destination / StaticResolver, it has the same security implications as the DynamicResolver / Dynamic Forward Proxy type

May 30 '25 01:05 arkodg

cc @Jeffwan @varungup90

May 30 '25 22:05 arkodg

instead of using Original Destination Cluster, this can also be represented as a LoadBalancer field in BTP https://gateway.envoyproxy.io/docs/api/extension_types/#loadbalancer which uses the new Envoy Override Host Load Balancer Policy https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/load_balancing_policies/override_host/v3/override_host.proto which supports a fallback IP that be used during a retry

cc @mathetake @yanavlasov @wbpcode

Jun 06 '25 21:06 arkodg

so where that BTP will be attached to? an empty backend or what

Jun 06 '25 21:06 mathetake

The BTP can attach to the Gateway or Route.

Is the BackendRef / Service definition static/known before hand so the IP in the header must be an endpoint of the Service ?

Jun 06 '25 22:06 arkodg

Is the BackendRef / Service definition static/known before hand so the IP in the header must be an endpoint of the Service ?

not sure at least for Inference Extension's reference impl of endpoint picker ties InferencePool to pods selected by label, not Service. So, i think it would be nice if EG allows these opaque backend as a target to attach this lb policy just like Dynamic Resolver does. in other words even when this is attached to HTTRoute, there's nothing to "route to" in the first place.

also not sure how it will work when this is attached to Gateway - that will result in all backends to have this override_host_sources configuration?

Jun 06 '25 22:06 mathetake

pods having the same label can be represented by a service using the label as a selector https://kubernetes.io/docs/concepts/services-networking/service/#defining-a-service

so the InferencePool could be represented as a Service by the EG user ?

if we can all align on this, we can focus only on this BTP enhancement, else we'd have to also add a Original DST type for Backend, and the user would be need to define get IP from this header twice

reg BTP-Gateway attachment, the policy would say get IP from header | metadata so its general, applies to all routes, the work of actually determining the value for the header or metadata (to make sure the ip is part of the final backendRef) is done by an external entity

Jun 06 '25 23:06 arkodg

yeah i guess at least users can define whatever dummy service in anyway if that doesn't work them. so +1 to go with focusing on BTP enhancement

Jun 06 '25 23:06 mathetake

refining my prev comment, outlining the 2 options in detail here (names and API fields are temporary)

Option 1 - Representing this as a LoadBalancer config in BackendTrafficPolicy where the HTTPRoute links to a Service

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: static-resolver-policy
  namespace: default
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: static-resolver-route
  loadBalancer:
    type: StaticResolver
      headers: 
      - "x-endpoint-ip"
      - "x-fallback-endpoint-ip"
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: static-resolver-route
  namespace: default
spec:
  parentRefs:
    - name: eg
  hostnames:
    - "www.example.com"
  rules:
    - 
      backendRefs:
        - kind: Service
           name: inference-pool-svc
           port: 3000

Option 2 - Representing this as a Backend type and HTTPRoute links to this Backend

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: static-resolver-route
  namespace: default
spec:
  parentRefs:
    - name: eg
  hostnames:
    - "www.example.com"
  rules:
    - 
      backendRefs:
        - kind: Backend
           group: gateway.envoyproxy.io
           name: static-resolver
           port: 3000

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
  name: static-resolver
spec:
  type: StaticResolver
  staticResolver:
    from:
      headers: 
      - "x-endpoint-ip"
      - "x-endpoint-fallback-ip"

please vote

Jun 07 '25 00:06 arkodg

FWIW, now, the multiple endpoints could (and should) be placed in same header/metadata entry in the latest EPP proposal.

Jun 07 '25 00:06 wbpcode

i would prefer option 2.

Jun 07 '25 00:06 mathetake

thanks @wbpcode, that should make the API simpler to

  staticResolver:
    from:
      header: "x-endpoint-ips"

Jun 07 '25 00:06 arkodg

+1 for option 2, the header and metadata should not be a slice.

Jun 08 '25 03:06 Xunzhuo

For option 2:

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
  name: backend-routing-based-on-header
  namespace: default
spec:
  type: HostOverride
  hostOverrideSettings:
    overrideHostSources:
    - header: target-pod
  endpoints:
    - fqdn:
        hostname: fallback.default.svc.cluster.local
        port: 80

It should be looked like in this way, the HostOverride lbPolicy requires the fallback upstream, so the previous api design is not correct..

This can be achieved another feature like, we do best-match for the pod in cluster, but if no best-match pod selected, use the endpoint in Backend to fallback, even it can be out-of-cluster.

Jun 30 '25 08:06 Xunzhuo

Not sure if I'm understanding this correctly, but it sounds like a "fallback" cluster isn't needed in the inference use cases - at least not for most of the use cases. If that's the case, we could simply ignore the endpoints in the Backend resource and use a dummy cluster in the generated xDS route.

Jun 30 '25 10:06 zhaohuabing

Thoughts on it? cc @mathetake

Jun 30 '25 10:06 Xunzhuo

I think the fallback can do such things like when access the inferencePool but selected no matched Pod, it routes to LLM Provider.

If we dont support fallback, we may just use Cluster_ORIGINAL_DST directly.

Jun 30 '25 10:06 Xunzhuo

the HostOverride lbPolicy requires the fallback upstream

is this true at Envoy level? If so, maybe we can just insert some random backend. If this is at the EG level validation, then shall we relax it when the type equals HostOverride as @zhaohuabing suggested.

Since the override host lb policy also allows the metadata to specify the fallback ip addresses, I don't think we in practice need Backend.spec.endpoints like Huabing said.

Jun 30 '25 15:06 mathetake

@mathetake I had a discussion with @wbpcode today, and found that if we only create the lb policy with host override but without actual EDS cluster, when I was adding the e2e, it will always return no healthy upstream.

The logic in host override relies on an actual fallback upstream. Not the EG Level restriction

So my question is if we don't rely on the fallback logic why don't we directly use the original dst cluster instead?

Jun 30 '25 16:06 Xunzhuo

Let me clarify that we have two "fallbacks" we are talking about.

One fallback is the envoy cluster configuration level endpoint that must exist (as you found that without which requests will always fail with no healthy upstream). The other is the fallback (ip:port pairs) endpoints set in a dynamic metadata or header.

I believe the "endpoint picker" implementation by Gateway API Inference Extension sets these fallback ip:port pairs as well. As far as I understand, original_dst is not aware of these "fallback ip:port" pairs (maybe i am wrong).

On the other hand, I am not sure in reality how useful that "fallback ip:port" will be used in practice -- EEP should be aware of the real time metrics, so i think that fallback rarely happens. So, i am fine with just starting small with orignal_dst then we can revisit the "override host" stuff.

Jun 30 '25 16:06 mathetake

Yes, the problem I met is if we implement the override host in this way:

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
  name: backend-routing-based-on-header
  namespace: default
spec:
  type: HostOverride
  hostOverrideSettings:
    overrideHostSources:
    - header: target-pod

I dont know the envoy cluster configuration level fallback with this expression, then when i added e2e tests around it the request will fail.

After realizing this, I have three thoughts:

keep most of current implementation approach, but add the endpoints in the host override backend for envoy cluster level fallback.

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
  name: backend-routing-based-on-header
  namespace: default
spec:
  type: HostOverride
  hostOverrideSettings:
    overrideHostSources:
    - header: target-pod
  endpoints:
    - xxxx

use backendtrafficpolicy to target to specific route:

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: static-resolver-policy
  namespace: default
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: static-resolver-route
  loadBalancer:
    type: HostOverride
    hostOverrideSettings:
      overrideHostSources:
      - header: target-pod
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: static-resolver-route
  namespace: default
spec:
  parentRefs:
    - name: eg
  hostnames:
    - "www.example.com"
  rules:
    - 
      backendRefs:
        - kind: Service
           name: inference-pool-svc
           port: 3000

in this way, we use the Service inference-pool-svc as the default fallback endpoint for envoy cluster, and also we can add the host override lbpolicy for the cluster.

use original_dst cluster directly.

// Specific configuration for the
// :ref:`Original Destination <arch_overview_load_balancing_types_original_destination>`
// load balancing policy.
// [#extension: envoy.clusters.original_dst]
type Cluster_OriginalDstLbConfig struct {
	state         protoimpl.MessageState
	sizeCache     protoimpl.SizeCache
	unknownFields protoimpl.UnknownFields

	// When true, a HTTP header can be used to override the original dst address. The default header is
	// :ref:`x-envoy-original-dst-host <config_http_conn_man_headers_x-envoy-original-dst-host>`.
	//
	// .. attention::
	//
	//	This header isn't sanitized by default, so enabling this feature allows HTTP clients to
	//	route traffic to arbitrary hosts and/or ports, which may have serious security
	//	consequences.
	//
	// .. note::
	//
	//	If the header appears multiple times only the first value is used.
	UseHttpHeader bool `protobuf:"varint,1,opt,name=use_http_header,json=useHttpHeader,proto3" json:"use_http_header,omitempty"`
	// The http header to override destination address if :ref:`use_http_header <envoy_v3_api_field_config.cluster.v3.Cluster.OriginalDstLbConfig.use_http_header>`.
	// is set to true. If the value is empty, :ref:`x-envoy-original-dst-host <config_http_conn_man_headers_x-envoy-original-dst-host>` will be used.
	HttpHeaderName string `protobuf:"bytes,2,opt,name=http_header_name,json=httpHeaderName,proto3" json:"http_header_name,omitempty"`
	// The port to override for the original dst address. This port
	// will take precedence over filter state and header override ports
	UpstreamPortOverride *wrapperspb.UInt32Value `protobuf:"bytes,3,opt,name=upstream_port_override,json=upstreamPortOverride,proto3" json:"upstream_port_override,omitempty"`
	// The dynamic metadata key to override destination address.
	// First the request metadata is considered, then the connection one.
	MetadataKey *v34.MetadataKey `protobuf:"bytes,4,opt,name=metadata_key,json=metadataKey,proto3" json:"metadata_key,omitempty"`
}

I checked that it looks like the Original Destination can fetch from the metadata as well(not sure about it), this way is quite easy to implement and also can deal with the GIE integration for now.

Jun 30 '25 17:06 Xunzhuo