support endpoint picker based on header/dynamic metadata
Description:
Describe the desired behavior, what scenario it enables and how it would be used.
AIBrix:
https://github.com/vllm-project/aibrix/blob/main/config/gateway/gateway.yaml looks like aibrix need to apply envoypatchpolicy in every new Gateway instance, thoughts on replacing it with other approaches?
- Route to extension server only when path /v1 + header route-strategy
- Route to endpoint based on target-pod header
Gateway-api inference extension also needs to pick endpoint based on header or metadata.
Based on Envoy Override Host Load Balancer Policy https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/load_balancing_policies/override_host/v3/override_host.proto which supports a fallback IP that be used during a retry
+1 if this can be solved by adding a type in the Backend resource called Original Destination / StaticResolver, it has the same security implications as the DynamicResolver / Dynamic Forward Proxy type
cc @Jeffwan @varungup90
instead of using Original Destination Cluster, this can also be represented as a LoadBalancer field in BTP https://gateway.envoyproxy.io/docs/api/extension_types/#loadbalancer which uses the new Envoy Override Host Load Balancer Policy https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/load_balancing_policies/override_host/v3/override_host.proto which supports a fallback IP that be used during a retry
cc @mathetake @yanavlasov @wbpcode
so where that BTP will be attached to? an empty backend or what
The BTP can attach to the Gateway or Route.
Is the BackendRef / Service definition static/known before hand so the IP in the header must be an endpoint of the Service ?
Is the BackendRef / Service definition static/known before hand so the IP in the header must be an endpoint of the Service ?
not sure at least for Inference Extension's reference impl of endpoint picker ties InferencePool to pods selected by label, not Service. So, i think it would be nice if EG allows these opaque backend as a target to attach this lb policy just like Dynamic Resolver does. in other words even when this is attached to HTTRoute, there's nothing to "route to" in the first place.
also not sure how it will work when this is attached to Gateway - that will result in all backends to have this override_host_sources configuration?
pods having the same label can be represented by a service using the label as a selector https://kubernetes.io/docs/concepts/services-networking/service/#defining-a-service
so the InferencePool could be represented as a Service by the EG user ?
if we can all align on this, we can focus only on this BTP enhancement, else we'd have to also add a Original DST type for Backend, and the user would be need to define get IP from this header twice
reg BTP-Gateway attachment, the policy would say get IP from header | metadata so its general, applies to all routes, the work of actually determining the value for the header or metadata (to make sure the ip is part of the final backendRef) is done by an external entity
yeah i guess at least users can define whatever dummy service in anyway if that doesn't work them. so +1 to go with focusing on BTP enhancement
refining my prev comment, outlining the 2 options in detail here (names and API fields are temporary)
Option 1 - Representing this as a LoadBalancer config in BackendTrafficPolicy where the HTTPRoute links to a Service
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: static-resolver-policy
namespace: default
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: HTTPRoute
name: static-resolver-route
loadBalancer:
type: StaticResolver
headers:
- "x-endpoint-ip"
- "x-fallback-endpoint-ip"
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: static-resolver-route
namespace: default
spec:
parentRefs:
- name: eg
hostnames:
- "www.example.com"
rules:
-
backendRefs:
- kind: Service
name: inference-pool-svc
port: 3000
Option 2 - Representing this as a Backend type and HTTPRoute links to this Backend
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: static-resolver-route
namespace: default
spec:
parentRefs:
- name: eg
hostnames:
- "www.example.com"
rules:
-
backendRefs:
- kind: Backend
group: gateway.envoyproxy.io
name: static-resolver
port: 3000
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
name: static-resolver
spec:
type: StaticResolver
staticResolver:
from:
headers:
- "x-endpoint-ip"
- "x-endpoint-fallback-ip"
please vote
FWIW, now, the multiple endpoints could (and should) be placed in same header/metadata entry in the latest EPP proposal.
i would prefer option 2.
thanks @wbpcode, that should make the API simpler to
staticResolver:
from:
header: "x-endpoint-ips"
+1 for option 2, the header and metadata should not be a slice.
For option 2:
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
name: backend-routing-based-on-header
namespace: default
spec:
type: HostOverride
hostOverrideSettings:
overrideHostSources:
- header: target-pod
endpoints:
- fqdn:
hostname: fallback.default.svc.cluster.local
port: 80
It should be looked like in this way, the HostOverride lbPolicy requires the fallback upstream, so the previous api design is not correct..
This can be achieved another feature like, we do best-match for the pod in cluster, but if no best-match pod selected, use the endpoint in Backend to fallback, even it can be out-of-cluster.
Not sure if I'm understanding this correctly, but it sounds like a "fallback" cluster isn't needed in the inference use cases - at least not for most of the use cases. If that's the case, we could simply ignore the endpoints in the Backend resource and use a dummy cluster in the generated xDS route.
Thoughts on it? cc @mathetake
I think the fallback can do such things like when access the inferencePool but selected no matched Pod, it routes to LLM Provider.
If we dont support fallback, we may just use Cluster_ORIGINAL_DST directly.
the HostOverride lbPolicy requires the fallback upstream
is this true at Envoy level? If so, maybe we can just insert some random backend. If this is at the EG level validation, then shall we relax it when the type equals HostOverride as @zhaohuabing suggested.
Since the override host lb policy also allows the metadata to specify the fallback ip addresses, I don't think we in practice need Backend.spec.endpoints like Huabing said.
@mathetake I had a discussion with @wbpcode today, and found that if we only create the lb policy with host override but without actual EDS cluster, when I was adding the e2e, it will always return no healthy upstream.
The logic in host override relies on an actual fallback upstream. Not the EG Level restriction
So my question is if we don't rely on the fallback logic why don't we directly use the original dst cluster instead?
Let me clarify that we have two "fallbacks" we are talking about.
One fallback is the envoy cluster configuration level endpoint that must exist (as you found that without which requests will always fail with no healthy upstream). The other is the fallback (ip:port pairs) endpoints set in a dynamic metadata or header.
I believe the "endpoint picker" implementation by Gateway API Inference Extension sets these fallback ip:port pairs as well. As far as I understand, original_dst is not aware of these "fallback ip:port" pairs (maybe i am wrong).
On the other hand, I am not sure in reality how useful that "fallback ip:port" will be used in practice -- EEP should be aware of the real time metrics, so i think that fallback rarely happens. So, i am fine with just starting small with orignal_dst then we can revisit the "override host" stuff.
Yes, the problem I met is if we implement the override host in this way:
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
name: backend-routing-based-on-header
namespace: default
spec:
type: HostOverride
hostOverrideSettings:
overrideHostSources:
- header: target-pod
I dont know the envoy cluster configuration level fallback with this expression, then when i added e2e tests around it the request will fail.
After realizing this, I have three thoughts:
- keep most of current implementation approach, but add the endpoints in the host override backend for envoy cluster level fallback.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
name: backend-routing-based-on-header
namespace: default
spec:
type: HostOverride
hostOverrideSettings:
overrideHostSources:
- header: target-pod
endpoints:
- xxxx
- use backendtrafficpolicy to target to specific route:
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: static-resolver-policy
namespace: default
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: HTTPRoute
name: static-resolver-route
loadBalancer:
type: HostOverride
hostOverrideSettings:
overrideHostSources:
- header: target-pod
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: static-resolver-route
namespace: default
spec:
parentRefs:
- name: eg
hostnames:
- "www.example.com"
rules:
-
backendRefs:
- kind: Service
name: inference-pool-svc
port: 3000
in this way, we use the Service inference-pool-svc as the default fallback endpoint for envoy cluster, and also we can add the host override lbpolicy for the cluster.
- use
original_dstcluster directly.
// Specific configuration for the
// :ref:`Original Destination <arch_overview_load_balancing_types_original_destination>`
// load balancing policy.
// [#extension: envoy.clusters.original_dst]
type Cluster_OriginalDstLbConfig struct {
state protoimpl.MessageState
sizeCache protoimpl.SizeCache
unknownFields protoimpl.UnknownFields
// When true, a HTTP header can be used to override the original dst address. The default header is
// :ref:`x-envoy-original-dst-host <config_http_conn_man_headers_x-envoy-original-dst-host>`.
//
// .. attention::
//
// This header isn't sanitized by default, so enabling this feature allows HTTP clients to
// route traffic to arbitrary hosts and/or ports, which may have serious security
// consequences.
//
// .. note::
//
// If the header appears multiple times only the first value is used.
UseHttpHeader bool `protobuf:"varint,1,opt,name=use_http_header,json=useHttpHeader,proto3" json:"use_http_header,omitempty"`
// The http header to override destination address if :ref:`use_http_header <envoy_v3_api_field_config.cluster.v3.Cluster.OriginalDstLbConfig.use_http_header>`.
// is set to true. If the value is empty, :ref:`x-envoy-original-dst-host <config_http_conn_man_headers_x-envoy-original-dst-host>` will be used.
HttpHeaderName string `protobuf:"bytes,2,opt,name=http_header_name,json=httpHeaderName,proto3" json:"http_header_name,omitempty"`
// The port to override for the original dst address. This port
// will take precedence over filter state and header override ports
UpstreamPortOverride *wrapperspb.UInt32Value `protobuf:"bytes,3,opt,name=upstream_port_override,json=upstreamPortOverride,proto3" json:"upstream_port_override,omitempty"`
// The dynamic metadata key to override destination address.
// First the request metadata is considered, then the connection one.
MetadataKey *v34.MetadataKey `protobuf:"bytes,4,opt,name=metadata_key,json=metadataKey,proto3" json:"metadata_key,omitempty"`
}
I checked that it looks like the Original Destination can fetch from the metadata as well(not sure about it), this way is quite easy to implement and also can deal with the GIE integration for now.