api icon indicating copy to clipboard operation
api copied to clipboard

Add controls around injected headers

Open howardjohn opened this issue 3 years ago • 18 comments

This PR adds controls to which headers should be added to requests/responses. This solves (most of) https://github.com/istio/istio/issues/17635, a common feature request.

A prototype is implemented in https://github.com/istio/istio/pull/37215.

Open questions:

  • Naming bikeshed
  • Mesh config or proxy config or something else? Currently its in proxy config
  • Is a list of ENUMs the best way to represent this? it does allow a strange config like [REQUEST_ID, REQUEST_ID] but that can just rejected

howardjohn avatar Feb 08 '22 22:02 howardjohn

cc @jacob-delgado @nrjpoddar @costinm @hzxuzhonghu @ramaraochavali you all probably have opinions here

howardjohn avatar Feb 08 '22 22:02 howardjohn

cc @nrjpoddar any thoughts?

howardjohn avatar Mar 01 '22 23:03 howardjohn

This feels like it belongs to VirtualService for more fine grained control.

nrjpoddar avatar Mar 15 '22 10:03 nrjpoddar

How does it work if I have XFCC setting configured via GatewayTopology setting?

I suppose you mean https://istio.io/latest/docs/ops/configuration/traffic-management/network-topologies/#configuring-x-forwarded-client-cert-headers? There is some overlap for sure but I think it's useful to have config here. This is an on/off switch that applies to sidecar+gateway, then at gateway you can additionally control how it works if it's "on"?

This feels like it belongs to VirtualService for more fine grained control. About half of these are configurable only at levels higher than Route in Envoy, some I suspect we would need, at best, Proxy Config if we wanted to make it fine grained? Like gw topology

howardjohn avatar Mar 15 '22 12:03 howardjohn

How does it work if I have XFCC setting configured via GatewayTopology setting?

I suppose you mean https://istio.io/latest/docs/ops/configuration/traffic-management/network-topologies/#configuring-x-forwarded-client-cert-headers? There is some overlap for sure but I think it's useful to have config here. This is an on/off switch that applies to sidecar+gateway, then at gateway you can additionally control how it works if it's "on"?

This feels like it belongs to VirtualService for more fine grained control. About half of these are configurable only at levels higher than Route in Envoy, some I suspect we would need, at best, Proxy Config if we wanted to make it fine grained? Like gw topology

Currently, for gateway XFCC is default ON IIRC. Are we going to preserve that behvaior?

nrjpoddar avatar Mar 24 '22 15:03 nrjpoddar

Currently, for gateway XFCC is default ON IIRC. Are we going to preserve that behvaior?

Yes. From the proto If an empty list is provided, all are enabled.. Today, all of these are enabled, so there will be no change to default behavior. This just adds the ability for users to opt out of them

howardjohn avatar Mar 24 '22 15:03 howardjohn

I am more concerned about the default behavior - I don't think users should start having to write configs to get correct and expected behavior.

  • inside the mesh we keep all headers ( I don't think there is any use case to configure it )
  • for Egress Gateway use case - where this is actually important and not working as intended, we either rely on the gateway class ( new world ) or some env variable to identify the gateway as 'egress'.
  • for ServiceEntry MESH_EXTERNAL we strip
  • for Original dst - we also strip
  • for Ingress Gateway - we respect topology, i.e. strip or preserve headers like XFCC

I don't mind fine tuning this via API - but I don't think this is a MeshConfig/global option, obviously inside the mesh some of the headers are critical ( XFCC is the only way to know for the app to know the caller ), and the use case is about not forwarding critical headers to external sites - which is also required by RFC and common practice.

Before adding a 1st class API - I would check what other proxy servers do about this, and how it would fit with the k8s gateway API. But I doubt a lot of users will ask for this if default behaviour is correct.

costinm avatar Apr 26 '22 16:04 costinm

Not sure we can really change the defaults due to backwards compatibility is the main concern

howardjohn avatar Apr 26 '22 17:04 howardjohn

Yes, backward compat is a concern in a lot of areas - we do need some mechanism to migrate to the new behavior, but it doesn't have to be a first class, long-term-support API, it's just intended for safe transition.

My personal preference is to use the migration to the new K8S Gateway model to fix all bad behaviors in Istio - this one, various things you discussed in the doc on original dst, automatic mTLS verification, etc. This can be done by creating a Gateway class mesh in a namespace to 'opt-in' to the new behavior ( and replace user-driven namespace annotation as well ).

We can also use an env variable ( injection and annotations makes it easy to add this, and they are meant for this kind of migrations). MeshConfig is problematic because it's global ( per-revision actually ), and set on install/upgrade. Also once we add it there - will be hard to take out.

On Tue, Apr 26, 2022 at 10:27 AM John Howard @.***> wrote:

Not sure we can really change the defaults due to backwards compatibility is the main concern

— Reply to this email directly, view it on GitHub https://github.com/istio/api/pull/2240#issuecomment-1110063094, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAUR2RNEZDJ3LIPZFTOBGTVHARPVANCNFSM5N3ZZMHA . You are receiving this because you were mentioned.Message ID: @.***>

costinm avatar Apr 26 '22 21:04 costinm

@nrjpoddar for gateway, I think the correct default behavior is to filter out XFCC from incoming requests, unless gateway topology is used. And to generate XFCC if the gateway terminates mTLS.

Gateway topology will determine if XFCC is forwarded or not, like Forwarded and other internal headers.

I can't think any good reason to have a different behavior in a gateway. Working on a doc to formalize this (identity delegation), should have it for next week.

costinm avatar Apr 26 '22 21:04 costinm

My personal preference is to use the migration to the new K8S Gateway model to fix all bad behaviors in Istio - this one, various things you discussed in the doc on original dst, automatic mTLS verification, etc. This can be done by creating a Gateway class mesh in a namespace to 'opt-in' to the new behavior ( and replace user-driven namespace annotation as well ).

This is actually largely my goal of this PR :-) I want to get to the point where someone turns on some option ("use new stuff", for a lack of a better name), we also turn all of this on. But I don't think we otherwise can just automatically do it because:

  • Simply using the gateway-api is not a signal, since you can use no APIs at all and get mesh, or use a mix
  • Even with gateway-api I think it is still reasonable to want to enable these, they just shouldn't be the default

howardjohn avatar Jun 29 '22 15:06 howardjohn

Is a list of ENUMs the best way to represent this? it does allow a strange config like [REQUEST_ID, REQUEST_ID] but that can just rejected

I think it is better to have to individual fields for the most common headers in the HeaderSpecifier that allows to enable/disable or set value for example server header can have a string and requestid can have an extension like UUID etc....

ramaraochavali avatar Aug 28 '22 06:08 ramaraochavali

Is a list of ENUMs the best way to represent this? it does allow a strange config like [REQUEST_ID, REQUEST_ID] but that can just rejected

I think it is better to have to individual fields for the most common headers in the HeaderSpecifier that allows to enable/disable or set value for example server header can have a string and requestid can have an extension like UUID etc....

I do like making it future-proof but I also can't really see us making them configurable personally... those seem like pretty niche cases?

howardjohn avatar Aug 29 '22 14:08 howardjohn

message ProxyHeaders { ForwardedClientCert forwarded_client_certs (APPEND|SANITIZE|APPEND_FORWARD....) bool enable_envoy_headers bool include_attempt_count bool skip_server_header }

I think a message like above would be more readable and help us expand later if needed instead of an enum.

ramaraochavali avatar Aug 30 '22 10:08 ramaraochavali

message ProxyHeaders { ForwardedClientCert forwarded_client_certs (APPEND|SANITIZE|APPEND_FORWARD....) bool enable_envoy_headers bool include_attempt_count bool skip_server_header }

I think a message like above would be more readable and help us expand later if needed instead of an enum.

I mostly agree - but no "envoy" in the API please since it should be moved to the vendor-neutral K8S Gateway policy.

Since ambient is L4 and doesn't add headers - would it make sense to just drop this PR, which is on ProxyConfig, and instead just focus on a Gateway policy for the PEPs ( which would apply to ingress/egress as well ) ?

costinm avatar Sep 10 '22 15:09 costinm

I mostly agree - but no "envoy" in the API please since it should be moved to the vendor-neutral K8S Gateway policy.

These are for literally controlling envoy headers - x-envoy-* ones, so having "envoy" specifically seems to make sense here? Or do you have alternative ideas?

howardjohn avatar Sep 10 '22 17:09 howardjohn

This also impacts sidecar so it seems tricky to have a "policy attachment" mechanism, especially when one hasn't been done so far in Istio yet

howardjohn avatar Sep 10 '22 17:09 howardjohn

The general problem here is to control 'proxy headers' - including vendor-specific headers like x-envoy but probably more important are the standard ones ( XFCC, XFF, etc ). Arguably users should not rely on envoy headers - maybe proxyless gRPC is used or some other gateways or proxy ( in particular with Ambient, where the PEP doesn't really have to be an envoy ).

It is possible in the list to include envoy - and other vendor-specific headers, of course - but I don't think we should keep adding envoy-specific APIs, in particular around gateway behavior ( and of course avoid header-specific APIs for sidecar if we expect ambient at some point).

I agree we haven't done a proper 'policy attachment' - technically most Istio APIs like authz are a form of policy attachment, just not using the official pattern and style. But if the criteria is "we can't do it because we never did it before" - we'll probably never do it :-)

On Sat, Sep 10, 2022 at 10:13 AM John Howard @.***> wrote:

I mostly agree - but no "envoy" in the API please since it should be moved to the vendor-neutral K8S Gateway policy.

These are for literally controlling envoy headers - x-envoy-* ones, so having "envoy" specifically seems to make sense here? Or do you have alternative ideas?

— Reply to this email directly, view it on GitHub https://github.com/istio/api/pull/2240#issuecomment-1242769891, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAUR2VKV4IG7JIY34HNXULV5S6SBANCNFSM5N3ZZMHA . You are receiving this because you were mentioned.Message ID: @.***>

costinm avatar Sep 10 '22 17:09 costinm

@howardjohn: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

istio-testing avatar Oct 31 '22 22:10 istio-testing

How about using one knob to turn off Envoy headers we send today.

And as a separate PR/effort, make sure we can expose the info we want using HttpRoute header manipulation, and formalize the {subst} ( test, document, etc), maybe with some envoy-independence as well, i.e. translating to envoy syntax. Many of the headers are related to telemetry and could be exposed by our telemetry module. At least for the long term - I don't mind if in the short term and for sidecar-only we add an API like this, but it needs to be very clearly documented as 'short term' and sidecar-style API only.

On Tue, Feb 7, 2023 at 12:46 PM John Howard @.***> wrote:

@.**** commented on this pull request.

In mesh/v1alpha1/proxy.proto https://github.com/istio/api/pull/2240#discussion_r1099212278:

@@ -547,6 +547,24 @@ message ProxyConfig {

// Specifies the details of the proxy image. istio.networking.v1beta1.ProxyImage image = 35; +

  • // Define the set of headers to add to HTTP request/responses. If an empty list is provided,
  • // all are enabled.
  • repeated HeaderSpecifier proxy_headers = 37;
  • enum HeaderSpecifier {
  • NO_HEADERS = 0;
  • // Specify to set the X-Forwarded-Client-Cert header for mTLS connections
  • FORWARDED_CLIENT_CERT = 1;
  • // Specify to generate a X-Request-Id header for a request, if one is not already set
  • REQUEST_ID = 2;
  • // Specify to include the X-Envoy-Attempt-Count header for requests.
  • ATTEMPT_COUNT = 3;
  • // Specify to include various X-Envoy-* headers, such as X-Envoy-Overloaded and `X-Envoy-Upstream-Service-Time.
  • ENVOY_HEADERS = 4;

I am not sure I would say these are not specific to envoy - all of these are X-Envoy-* which seems explicitly Envoy related. Note envoy doesn't actually expose each header through an api, we have one knob to control a variety of X-envoy headers

— Reply to this email directly, view it on GitHub https://github.com/istio/api/pull/2240#discussion_r1099212278, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAUR2Q3YJ2QQMZ35V2637DWWKYEFANCNFSM5N3ZZMHA . You are receiving this because you were mentioned.Message ID: @.***>

costinm avatar Feb 08 '23 00:02 costinm

Will this include an option to strip x-envoy-peer-metadata too potentially ? more concerned about the MESH_EXTERNAL usecase

asayah avatar Mar 02 '23 17:03 asayah

Will this include an option to strip x-envoy-peer-metadata too potentially ? more concerned about the MESH_EXTERNAL usecase

I believe so.

Interested in how to get more movement on this PR, it has been out for over a year and a lot of users ask for this feature.

linsun avatar Mar 21 '23 02:03 linsun

Another scenario to be considered - can I control injected header only for egress or ingress traffic but not for intra-mesh traffic? With egress gateway- this could be separately by namespace when applying ProxyConfig but what if users don't have egress gateway?

linsun avatar Mar 21 '23 02:03 linsun

Generally understanding what is "in mesh" vs "external" is hard to do accurately, or at least in ways that align with users expectations

howardjohn avatar Mar 21 '23 02:03 howardjohn

I think for ambient we will need to do a better job - my proposal was to treat everything with a private IP as internal by default, and everything with a public IP as external. With a way to override this for the cases where it's not true. That works well for ztunnel - and simplifies handling of egress ( we can even do the same auto-detection, if a per-namespace egress gateway is present auto-set it, otherwise the one in istio-system ).

That is assuming ambient will be on by default in all clusters and VMs/etc in the mesh - which is at least possible.

The other criteria is 'has a ztunnel with mesh certificates' - which can be probed.

Another is 'has a PTR-DS ( or some DNS TXT records for internet) entry with mesh certificates and ztunnel address' - about the same thing.

On Mon, Mar 20, 2023 at 7:43 PM John Howard @.***> wrote:

Generally understanding what is "in mesh" vs "external" is hard to do accurately, or at least in ways that align with users expectations

— Reply to this email directly, view it on GitHub https://github.com/istio/api/pull/2240#issuecomment-1477204512, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAUR2UQGCH4YMZLO2A6AD3W5EIUHANCNFSM5N3ZZMHA . You are receiving this because you were mentioned.Message ID: @.***>

costinm avatar Mar 21 '23 04:03 costinm

I think for ambient we will need to do a better job - my proposal was to treat everything with a private IP as internal by default, and everything with a public IP as external. With a way to override this for the cases where it's not true. That works well for ztunnel - and simplifies handling of egress ( we can even do the same auto-detection, if a per-namespace egress gateway is present auto-set it, otherwise the one in istio-system ). That is assuming ambient will be on by default in all clusters and VMs/etc in the mesh - which is at least possible. The other criteria is 'has a ztunnel with mesh certificates' - which can be probed. Another is 'has a PTR-DS ( or some DNS TXT records for internet) entry with mesh certificates and ztunnel address' - about the same thing. On Mon, Mar 20, 2023 at 7:43 PM John Howard @.> wrote: Generally understanding what is "in mesh" vs "external" is hard to do accurately, or at least in ways that align with users expectations — Reply to this email directly, view it on GitHub <#2240 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAUR2UQGCH4YMZLO2A6AD3W5EIUHANCNFSM5N3ZZMHA . You are receiving this because you were mentioned.Message ID: @.>

With multi primary multi-network clusters, there are scenarios we would still want the headers for cross cluster traffic but not for external traffic. I agree this could be tricky to get right. For egress traffic, perhaps users can let us know through their service entry configuration if they need control around the injected headers?

linsun avatar Mar 22 '23 02:03 linsun

Discussed this in WG but I want to leave this here as well.

I think this API is at the wrong level of abstraction. The control for the header injection should reside with the feature that uses it rather than have a separate header removal override.

  1. x-forwarded-client-cert needs an actual API about the propagation of client identity. This is probably a good idea anyways given that ambient might use a different mechanism.
  2. x-attempt-count is coupled with retries. It's fair to have this header for retried request by default, and Istio simply enables retries by default.
  3. x-request-id is critical to tracing and needs to be part of tracing API. The trace ID part of it is important for the trace provider.
  4. server needs a separate API - it is also a good idea not to override it at all.
  5. There's no such thing as "Envoy headers". We should use RFC names, not invent new classification for headers.

Another important bit missing is the control should distinguish the edge traffic from the in-cluster traffic. There should be more sanitization applied for ingress or egress traffic - we shouldn't blindly trust client-cert header at ingress gateway.

kyessenov avatar May 17 '23 17:05 kyessenov

  1. x-forwarded-client-cert needs an actual API about the propagation of client identity. This is probably a good idea anyways given that ambient might use a different mechanism.
  1. x-attempt-count is coupled with retries. It's fair to have this header for retried request by default, and Istio simply enables retries by default.

I don't think its coupled. its perfectly valid to want retries without adding this debug header. we could tie configuration of this to the retry config, but I don't think this low level config belongs in high level APIs like VS and users probably do not want to have to set it for each route.

  1. x-request-id is critical to tracing and needs to be part of tracing API. The trace ID part of it is important for the trace provider.

I don't mind moving this in to Tracing API

  1. server needs a separate API - it is also a good idea not to override it at all.

Not sure why it needs a separate API. This PR is adding an API, so saying it needs an API is not clear to me.

  1. There's no such thing as "Envoy headers". We should use RFC names, not invent new classification for headers.

X-Envoy-* to me seem to be "Envoy headers". Envoy itself calls them these - suppress_envoy_headers is the XDS API.

Another important bit missing is the control should distinguish the edge traffic from the in-cluster traffic. There should be more sanitization applied for ingress or egress traffic - we shouldn't blindly trust client-cert header at ingress gateway.

This is about sending headers, not using them on inbound. so while I agree we should not trust XFCC at ingress, we already don't and this is unrelated to the goals of this PR

howardjohn avatar May 17 '23 18:05 howardjohn

  1. x-attempt-count is coupled with retries. It's fair to have this header for retried request by default, and Istio simply enables retries by default.

I don't think its coupled. its perfectly valid to want retries without adding this debug header. we could tie configuration of this to the retry config, but I don't think this low level config belongs in high level APIs like VS and users probably do not want to have to set it for each route.

Having a disjoint API to override VS API feels like a more restricted EnvoyFilter to me. It's useful, but not principled enough to justify a future stable API IMHO.

  1. server needs a separate API - it is also a good idea not to override it at all.

Not sure why it needs a separate API. This PR is adding an API, so saying it needs an API is not clear to me.

The difference is having an API called ServerHeaderInjection or a nested field in an existing API, vs an API called HeaderOperations that singles out "server".

  1. There's no such thing as "Envoy headers". We should use RFC names, not invent new classification for headers.

X-Envoy-* to me seem to be "Envoy headers". Envoy itself calls them these - suppress_envoy_headers is the XDS API.

All other APIs take a lot of effort to decouple from the concrete data plane implementation (Envoy), I don't see why we should do it differently here.

This is about sending headers, not using them on inbound. so while I agree we should not trust XFCC at ingress, we already don't and this is unrelated to the goals of this PR

For sending headers, we need to distinguish between in-mesh and out-of-mesh recipients. That seems like a very basic request to only strip headers for egress traffic.

kyessenov avatar May 17 '23 22:05 kyessenov

Having a disjoint API to override VS API feels like a more restricted EnvoyFilter to me. It's useful, but not principled enough to justify a future stable API IMHO.

Isn't one of our goals to take common EnvoyFilter use cases and expose them as proper APIs?

The difference is having an API called ServerHeaderInjection or a nested field in an existing API, vs an API called HeaderOperations that singles out "server".

I am not tied, at all, to the syntax of the API. So that works for me.

All other APIs take a lot of effort to decouple from the concrete data plane implementation (Envoy), I don't see why we should do it differently here.

How about we just call it "DebugHeaders" or something?

For sending headers, we need to distinguish between in-mesh and out-of-mesh recipients. That seems like a very basic request to only strip headers for egress traffic.

I don't disagree but I don't see a reasonable way to implement this, nor have I seen users ask for this level of granularity (aside from MX, which is a different problem IMO)

howardjohn avatar May 18 '23 14:05 howardjohn

Long but good discussion, sorry I haven't read all the comments, but from a telco/3GPP spec point of view, "-Envoy-" HTTP headers are at least not allowed across k8s clusters, that's why our stakeholders have always asked to remove these envoy specific headers for cross-k8s communication, so it would be great if we could support that.

hobbytp avatar May 19 '23 08:05 hobbytp