envoy icon indicating copy to clipboard operation
envoy copied to clipboard

Golang filter: provide a way to get all headers

Open spacewander opened this issue 2 years ago • 26 comments

Title: Golang filter: provide a way to get all headers

Description:

Describe the desired behavior, what scenario it enables and how it would be used.

We can add a new method func (h *Header) GetAll(copy bool) map…. The method takes a boolean to indicate whether we should return a copy of internal headers. If the headers are not copied, it is the user's duty to ensure the headers won't be written later or accessed concurrently.

Here are the advantages:

  1. it is more effective than Range as we don't need to lock for long
  2. it is easier to write than Range if people want to dump all the headers
  3. Wasm provides a similar method so we can align to that implementation

If this method is added, we can retire the RangeWithCopy because if people are afraid of the deadlock, they can use GetAll with copy instead.

spacewander avatar Sep 13 '23 05:09 spacewander

cc @doujiang24 @wangfakang @StarryVae golang's filter code-owners.

adisuissa avatar Sep 13 '23 17:09 adisuissa

I'm fine with GetAll + copy, but I'm not sure with nocopy, since nocopy will expose the internal type map[string][]string.

I was thinking of an optimization: maybe we could use a higher performance type instead of map[string][]string. In the current map[string][]string, one unique header name, requires at least two GC pointers:

  1. One for the slice []string
  2. Another one for the string data of the value.

Compared to map[string]string, it requires one more GC pointer.

But, I haven't figured out a better way yet. So just a little worried about exposing the internal type to users.

doujiang24 avatar Sep 14 '23 04:09 doujiang24

map[string][]string is used as header because we follow the same type used by Go: https://github.com/golang/go/blob/495830acd6976c8a2b39dd4aa4fdc105ad72de52/src/net/http/header.go#L24.

If the value is string instead of []string, it will cause some trouble when handling repeated headers.

If nocopy is not needed, we can use Clone, which is the same as the http.Header type in the Go.

spacewander avatar Sep 14 '23 04:09 spacewander

yep, I agree that we should handle the repeated header correctly. just a little worried about exposing the internal type to users, that will make it harder to optimize it in the feature.

doujiang24 avatar Sep 14 '23 06:09 doujiang24

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] avatar Oct 14 '23 08:10 github-actions[bot]

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

github-actions[bot] avatar Oct 21 '23 12:10 github-actions[bot]

I also think a method like GetAllHeaders would be useful.

I don't know much about C++ programming so apologies if the following paragraph is uninformed.

Presumably it would be faster to pass a data structure containing all the headers to the Golang plugin, rather than having the Golang plugin retrieve them from the HeaderMap using Range

Also, the issue with Range is that you iterate over all the headers, which is a waste of time because the special ones like scheme etc are already set in the header map as values - so it would be good to have a capability to get all the other headers

willemveerman avatar Apr 08 '24 15:04 willemveerman

I also think a method like GetAllHeaders would be useful.

I don't know much about C++ programming so apologies if the following paragraph is uninformed.

Presumably it would be faster to pass a data structure containing all the headers to the Golang plugin, rather than having the Golang plugin retrieve them from the HeaderMap using Range

We cache the entire header map on the Golang side, in a single C++ Golang interchange. Range doesn't mean getting values from C++ one by one, there won't be much performance difference.

Also, the issue with Range is that you iterate over all the headers, which is a waste of time because the special ones like scheme etc are already set in the header map as values - so it would be good to have a capability to get all the other headers

okay, you means all headers exclude the special ones like scheme, path and etc, right?

I think this might be a proper choice, it always do a copy for safety, thoughts?

func (h *Header) GetAll(skip_internal bool) map[string][]string

doujiang24 avatar Apr 09 '24 01:04 doujiang24

We cache the entire header map on the Golang side, in a single C++ Golang interchange. Range doesn't mean getting values from C++ one by one, there won't be much performance difference.

OK, great - I wasn't aware of this.

okay, you means all headers exclude the special ones like scheme, path and etc, right?

Yes exactly, because the special ones are already available as attributes - https://github.com/envoyproxy/envoy/blob/5aadcb0e68ae168562ab2842b875a75b1bbee6ce/contrib/golang/common/go/api/type.go#L138-L141

I think this might be a proper choice, it always do a copy for safety, thoughts?

That seems perfect to me. Will the looping occur on the golang side or the C++ side? Just thinking the optimal thing would be to do any sort of loop on the C++ side so the whole operation is as fast as possible.

willemveerman avatar Apr 11 '24 18:04 willemveerman

Given the header map object is already cached, I wonder if the additional headers, i.e. the non-special ones, can't me made available in some sort of simple data structure in Golang - e.g. all_other_headers map[string][]string

This prevents the user of the filter needing to create and populate such a data structure themselves for every request

willemveerman avatar Apr 11 '24 18:04 willemveerman

Will the looping occur on the golang side or the C++ side? Just thinking the optimal thing would be to do any sort of loop on the C++ side so the whole operation is as fast as possible.

Golang side should be good enough, it's fast enough, no cgo call overhead.

This prevents the user of the filter needing to create and populate such a data structure themselves for every request

It's a trade-off, since we already have a map[string][]string cache for the whole headers. This depends on how often the API will be used.

doujiang24 avatar Apr 14 '24 11:04 doujiang24

It's a trade-off, since we already have a map[string][]string cache for the whole headers. This depends on how often the API will be used.

Fair enough, is it possible to access the existing cache without creating another map[string][]string?

willemveerman avatar Apr 14 '24 17:04 willemveerman

Fair enough, is it possible to access the existing cache without creating another map[string][]string?

it could be done in the current Range api, but it contains the special headers.

doujiang24 avatar Apr 15 '24 11:04 doujiang24

it could be done in the current Range api, but it contains the special headers.

Oh I see, I'm beginning to understand how it all works now.

So there's an existing map[string][]string under the hood, so to speak, and to access each element you can use Range

But the OP was proposing to write a func which would return a copy of the entire existing map[string][]string, is that right?

I think that would be a good idea because it would be much faster.

In some cases users will want to build other data structures using each header value as an attribute. With the existing Range approach we need to loop over every key, and also every value - so, in the the circumstance of repeated headers, a key will contain several values in the []string array and it will take time. However, with a func which returns map[string][]string we can pass the entire []string to the attribute without having to loop over every item in the array.

willemveerman avatar Apr 16 '24 00:04 willemveerman

Also, could the same method be added to the cluster_specifier_plugin so we can get all the headers, then route to a particular cluster based on the headers of each request.

willemveerman avatar Apr 21 '24 01:04 willemveerman

But the OP was proposing to write a func which would return a copy of the entire existing map[string][]string, is that right?

yep.

Also, could the same method be added to the cluster_specifier_plugin so we can get all the headers, then route to a particular cluster based on the headers of each request.

Do you really want all of the headers? maybe HeaderMap.Values(key string) []string works for your? to get the values of a specified key.

doujiang24 avatar Apr 22 '24 01:04 doujiang24

Do you really want all of the headers? maybe HeaderMap.Values(key string) []string works for your? to get the values of a specified key.

~~Well if it works like that I don't see the point of the plugin.~~

~~Let's say I use the golang http plugin to dynamically set a header, e.g. X-SPECIAL-HEADER: cluster-z, and then use the golang cluster_specifier plugin to get the value of that header and route to cluster-z~~

~~I can already use envoy's config to route to a cluster based on the value of X-SPECIAL-HEADER, so there's no point in using the golang cluster_specifier plugin because it will be slower than using native envoy config.~~

~~On the other hand, if I can use one plugin to read all the headers and then route to a particular cluster it's much faster; I can read the existing headers, decide on a cluster and then route to it, without needing to add X-SPECIAL-HEADER: cluster-z~~


Edit, I made a mistake. With Envoy's native config you can set a route based on a header value, but you can't dynamically route to any cluster based on a header value - which is something you can do with the golang cluster_specifier plugin.

So I guess the general idea is you can set a header in the golang http plugin, and then route to a cluster using that header with the golang cluster_specifier plugin.

Personally, I think it would be great if you could retrieve all headers in the golang cluster_specifier plugin because then the flow will be much faster. You can just read the headers, decide on a cluster, and route to it - without having to add a header or invoke two separate plugins in the request flow.

willemveerman avatar Apr 23 '24 17:04 willemveerman

but you can't dynamically route to any cluster based on a header value

cluster_header might be it: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto#envoy-v3-api-msg-config-route-v3-routeaction

I think it would be great if you could retrieve all headers in the golang

yep, GetAll could be better when you need all headers. If you only need some specified headers with pre-known header names, Get/Values should be works too.

Anyway, there is no disagreements about adding the new API:

func (h *Header) GetAll(skip_internal bool) map[string][]string

Would you like to have a try? Seems it could be pure Golang implementation, no C++ code changes need.

doujiang24 avatar Apr 26 '24 12:04 doujiang24

cluster_header might be it: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto#envoy-v3-api-msg-config-route-v3-routeaction

Ah yes! Thank you.

So you can set a header, then route based on that header's value using Envoy's native config.

So, if the golang cluster_specifier can also access the underlying map[string][]string header cache then this would be the fastest mechanism for routing using a golang plugin, without needing to call another plugin:

  1. Read all headers by accessing HeaderMap map[string][]string
  2. Decide on a cluster
  3. Return chosen cluster name

Therefore, could 1. be added to golang cluster_specifier?

In the mean time I will try the golang implementation you mentioned in for the golang http plugin.

willemveerman avatar Apr 26 '24 13:04 willemveerman

Therefore, could 1. be added to golang cluster_specifier?

yep. it should be a common API for HeaderMap.

In the mean time I will try the golang implementation you mentioned in for the golang http plugin.

Cool!

doujiang24 avatar Apr 26 '24 13:04 doujiang24

I thought about it and I realised that it probably doesn't make sense to exclude the internal/special headers. This is because the user will most likely want all the headers - the only reason to exclude them is that in the case of Range it would be faster to do so. But if we're passing an entire copy of the underlying map[string][]string there's no time penalty incurred if we pass all the headers.

The draft PR is here - do you have any thoughts? It's very simple. https://github.com/envoyproxy/envoy/pull/33821

willemveerman avatar Apr 29 '24 11:04 willemveerman

cluster_header might be it: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto#envoy-v3-api-msg-config-route-v3-routeaction

Noticed that cluster_header actually doesn't work for this use-case, as detailed here: https://github.com/envoyproxy/envoy/issues/33878

willemveerman avatar Apr 30 '24 22:04 willemveerman

the only reason to exclude them is that in the case of Range it would be faster to do so.

okay, it's really not a good idea to design API by the performance that you thought. As said previously, headers already cached in Golang side, there shouldn't be much performance diff.

The draft PR is here - do you have any thoughts? It's very simple. #33821

added some comments there.

doujiang24 avatar May 06 '24 02:05 doujiang24

Therefore, could 1. be added to golang cluster_specifier?

yep. it should be a common API for HeaderMap.

I'm happy to make an attempt on the Golang side of this, but I'll need the c++ parts to be done by someone else because I don't have experience with it

willemveerman avatar May 14 '24 10:05 willemveerman

I'm happy to make an attempt on the Golang side of this, but I'll need the c++ parts to be done by someone else because I don't have experience with it

maybe you can copy some code from golang http plugin, I'm happy to help if you need it.

doujiang24 avatar May 15 '24 01:05 doujiang24

I'm happy to make an attempt on the Golang side of this, but I'll need the c++ parts to be done by someone else because I don't have experience with it

maybe you can copy some code from golang http plugin, I'm happy to help if you need it.

Yes I thought I might be able to do that.. OK, I'll give that a go in another PR and then tag you

willemveerman avatar May 15 '24 11:05 willemveerman

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jun 14 '24 12:06 github-actions[bot]

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

github-actions[bot] avatar Jun 21 '24 16:06 github-actions[bot]