api icon indicating copy to clipboard operation
api copied to clipboard

add stateful session load balancer support

Open ramaraochavali opened this issue 2 years ago • 88 comments

API support needed for https://github.com/istio/istio/issues/39740 and https://github.com/istio/istio/issues/37910

ramaraochavali avatar Aug 30 '22 10:08 ramaraochavali

Specifically document the difference between consistentHash would be nice as well

On Wed, Aug 31, 2022 at 12:02 PM Costin Manolache @.***> wrote:

@.**** commented on this pull request.

In networking/v1alpha3/destination_rule.proto https://github.com/istio/api/pull/2469#discussion_r959930570:

@@ -444,6 +444,51 @@ message Subset { TrafficPolicy traffic_policy = 3; }

+// Describes a HTTP cookie that will be used for session stickiness. +// If the cookie is not present, it will be generated. +message HTTPCookie {

  • // Name of the cookie.
  • string name = 1 [(google.api.field_behavior) = REQUIRED];
  • // Path to set for the cookie.
  • string path = 2;
  • // Lifetime of the cookie.
  • google.protobuf.Duration ttl = 3 [(google.api.field_behavior) = REQUIRED]; +};

+// Standard load balancing algorithms that require no tuning. +enum SimpleLB {

Probably not - but why is it required for this PR ?

— Reply to this email directly, view it on GitHub https://github.com/istio/api/pull/2469#discussion_r959930570, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEYGXOUXFCDMVWAMXPTVKDV36T3PANCNFSM6AAAAAAQAIAG54 . You are receiving this because your review was requested.Message ID: @.***>

howardjohn avatar Aug 31 '22 19:08 howardjohn

On Wed, Aug 31, 2022 at 2:24 AM Zhonghu Xu @.***> wrote:

@.**** commented on this pull request.

In networking/v1alpha3/destination_rule.proto https://github.com/istio/api/pull/2469#discussion_r959364231:

@@ -623,6 +624,19 @@ message LoadBalancerSettings { uint64 minimum_ring_size = 4 [deprecated=true]; };

  • // StatefulSession load balancer load balances across backend hosts using 'strong' stickiness.
  • // This load balancer enforces stable session stickiness by routing requests to the same backend

And a further question, what if the instance is down and actually not in the eds?

I think another important question: what if the instance is up - but not in EDS ?

Does envoy do any checks to make sure this is not an open proxy - I hope so, and it needs to be clearly explained ( and tested !).

— Reply to this email directly, view it on GitHub https://github.com/istio/api/pull/2469#discussion_r959364231, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAUR2XXS3FLDIGTTYBB5ELV34QDZANCNFSM6AAAAAAQAIAG54 . You are receiving this because you were assigned.Message ID: @.***>

costinm avatar Aug 31 '22 22:08 costinm

Specifically document the difference between consistentHash would be nice as well

Described in https://github.com/grpc/proposal/blob/e49869d41d91ca5e0ea538c833a1480f90b348a0/A55-stateful-session-affinity.md#rationale

sanjaypujare avatar Sep 01 '22 21:09 sanjaypujare

IMO we should not mix LB - which selects an endpoint based on different rules, including affinity - with session stickiness, i.e. regardless of LB go to this specific IP:port.

I did more research on how other implementations work - in particular when stickiness is exposed trough gateways or other infrastructure is involved - I am leaning more towards just "enable sticky sessions" in the API - and maybe the duration, and leaving the name of the cookie, path, etc out of the API. If we integrate with other infra - they may have already defined cookies and mechanisms to attach them to paths. I don't think use defining a different cookie name for each service is that valuable or needed, and if we get requests we can add it later. And for the path - I would add it after we have a good understanding on how it'll work with gateways and routing.

A smaller PR with just 'enable sticky' and time would move faster and allow us to get feedback - we can add more later, it's harder to remove.

costinm avatar Sep 02 '22 00:09 costinm

One more question - I've looked at proxyless gRPC and gateway use cases, the client or browser handles the cookie jar.

For 'mesh' use case - is the client-side sidecar holding a cookie jar when making the requests ? Or does the user have to change their code to track cookies ?

And if the client is gRPC ( with envoy sidecar for security/telemetry/etc - not proxyless ) - does gRPC use a cookie jar already ?

If you find out - please add it to the comments, in particular if user code needs to track the cookie, I haven't figured it out from envoy docs.

costinm avatar Sep 02 '22 04:09 costinm

If you find out - please add it to the comments, in particular if user code needs to track the cookie, I haven't figured it out from envoy docs.

Yes. User code has to track the cookie. And we are supporting this https://github.com/envoyproxy/envoy/issues/22475 in Envoy so that non browser based apps can use this header to track instead of cookie.

ramaraochavali avatar Sep 02 '22 04:09 ramaraochavali

and maybe the duration, and leaving the name of the cookie

I think leaving path seems OK. But name should be given. We can't assume/standarize the name from some annotations. If name is not given we can derive based on other infrastructure data as you mentioned. But for typical Istio/envoy use case, name is important

ramaraochavali avatar Sep 02 '22 04:09 ramaraochavali

@costinm I agree with most of the comments except the name of cookie. If we agree on that, I will delink it from LB policy and introduce statefulSesstion config, if provided, sticky ness would be enabled.

ramaraochavali avatar Sep 02 '22 05:09 ramaraochavali

Not sure using header instead of cookie helps much - most http clients have support for cookie tracking, just has to be enabled - and I assume gRPC will add some easy option if they support sticky sessions for proxyless gRPC. Cookie is a header too - so if an app can set a header, they can set a cookie too.

I remember in Java it was a request parameter - jsessionid - and the cookie name was also standardized. I'm not saying to hard code - just keep it in meshConfig or some env variable. I would guess java applications will have to use the name that is defined in the servlet spec ( or was 20 years ago, I have a strong deja vue...). Concern is that each service defines a different cookie and we end up with a mess. It may be a way around path problems - I'm not completely opposed to customizing the cookie name, just want to make sure we evaluate pros and cons.

On Thu, Sep 1, 2022 at 10:00 PM Rama Chavali @.***> wrote:

@costinm https://github.com/costinm I agree with most of the comments except the name of cookie. If we agree on that, I will delink it from LB policy and introduce statefulSesstion config, if provided, sticky ness would be enabled.

— Reply to this email directly, view it on GitHub https://github.com/istio/api/pull/2469#issuecomment-1235067921, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAUR2SDVQENFHWQH3SV2UTV4GCX7ANCNFSM6AAAAAAQAIAG54 . You are receiving this because you were mentioned.Message ID: @.***>

costinm avatar Sep 02 '22 05:09 costinm

@costinm changed. PTAL.

ramaraochavali avatar Sep 02 '22 06:09 ramaraochavali

What is the difference between a repeated oneof with 10 fields and a message with 10 fields ? In yaml it's far cleaner with a message:

persistentSession:

  • cookie:
    • name: JSESSIONID duration: 1h
  • header:
    • name: foo

persistentSession: cookie: JESSIONID header: foo

Besides extra indentation and complexity ( and marshalling overhead ) - you don't gain much, spraying fields is far cleaner than spraying messages in hige oneofs ( which end up looking just like the message ).

On Tue, Sep 6, 2022 at 8:18 AM Rama Chavali @.***> wrote:

@.**** commented on this pull request.

In networking/v1alpha3/destination_rule.proto https://github.com/istio/api/pull/2469#discussion_r963840111:

@@ -623,6 +623,23 @@ message LoadBalancerSettings { uint64 minimum_ring_size = 4 [deprecated=true]; };

  • // Stateful Session configuration.
  • message StatefulSession {
  • // SessionCookie is used to maintain the session state across client and server.
  • message SessionCookie {
  •   // Name of the cookie to be used. If not specified, environment specific
    
  •   // configuration is used.
    
  •   string name = 1;
    
  •   // Session duration. If not specified, a duration of 1 hour is used.
    
  •   google.protobuf.Duration duration = 2;
    
  • }
  • // The session state to use.
  • oneof session_state {

I am not saying Envoy may not support others. All I am saying the state would be exclusive i.e. either cookie or header or some thing else. But not combination of all. I can drop oneof here but I think nested message would help us to add oneof later if needed rather than spraying fields across.

— Reply to this email directly, view it on GitHub https://github.com/istio/api/pull/2469#discussion_r963840111, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAUR2VKTV3NEXQUQQHKQATV45ODHANCNFSM6AAAAAAQAIAG54 . You are receiving this because you were mentioned.Message ID: @.***>

costinm avatar Sep 06 '22 15:09 costinm

BTW - it may be best to start with the UX ( including how many levels of indentation and how much pain the user must deal with), not with internal proto. The API we expose is yaml/json - we just happen to use proto internally, but that's not exposed to users.

On Tue, Sep 6, 2022 at 8:38 AM Costin Manolache @.***> wrote:

What is the difference between a repeated oneof with 10 fields and a message with 10 fields ? In yaml it's far cleaner with a message:

persistentSession:

  • cookie:
    • name: JSESSIONID duration: 1h
  • header:
    • name: foo

persistentSession: cookie: JESSIONID header: foo

Besides extra indentation and complexity ( and marshalling overhead ) - you don't gain much, spraying fields is far cleaner than spraying messages in hige oneofs ( which end up looking just like the message ).

On Tue, Sep 6, 2022 at 8:18 AM Rama Chavali @.***> wrote:

@.**** commented on this pull request.

In networking/v1alpha3/destination_rule.proto https://github.com/istio/api/pull/2469#discussion_r963840111:

@@ -623,6 +623,23 @@ message LoadBalancerSettings { uint64 minimum_ring_size = 4 [deprecated=true]; };

  • // Stateful Session configuration.
  • message StatefulSession {
  • // SessionCookie is used to maintain the session state across client and server.
  • message SessionCookie {
  •   // Name of the cookie to be used. If not specified, environment specific
    
  •   // configuration is used.
    
  •   string name = 1;
    
  •   // Session duration. If not specified, a duration of 1 hour is used.
    
  •   google.protobuf.Duration duration = 2;
    
  • }
  • // The session state to use.
  • oneof session_state {

I am not saying Envoy may not support others. All I am saying the state would be exclusive i.e. either cookie or header or some thing else. But not combination of all. I can drop oneof here but I think nested message would help us to add oneof later if needed rather than spraying fields across.

— Reply to this email directly, view it on GitHub https://github.com/istio/api/pull/2469#discussion_r963840111, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAUR2VKTV3NEXQUQQHKQATV45ODHANCNFSM6AAAAAAQAIAG54 . You are receiving this because you were mentioned.Message ID: @.***>

costinm avatar Sep 06 '22 15:09 costinm

how do you specify either cookie or header can be set but not both?

persistentSession:
  cookie: JESSIONID
  header: foo

Are you saying, we will document that but not enforce via protobuf?

ramaraochavali avatar Sep 06 '22 15:09 ramaraochavali

I mean: we should not have a convoluted API to enforce something that is more of an implementation bug than intended behavior.

On Tue, Sep 6, 2022 at 8:45 AM Costin Manolache @.***> wrote:

From API perspective ( and how persistent sessions work with other serves ) - both header and cookie ( and other forms like TLS session ID, request parameter, etc) can be used.

The doc should mention that envoy currently supports only one. There is no reason to believe it is impossible for envoy to support multiple, and we should not design the API based on an envoy limitation.

On Tue, Sep 6, 2022 at 8:43 AM Rama Chavali @.***> wrote:

how do you specify either cookie or header can be set but not both?

persistentSession: cookie: JESSIONID header: foo

Are you saying, we will document that but not enforce via protobuf?

— Reply to this email directly, view it on GitHub https://github.com/istio/api/pull/2469#issuecomment-1238327886, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAUR2T5R3KKLN4O2PGBCTTV45RCJANCNFSM6AAAAAAQAIAG54 . You are receiving this because you were mentioned.Message ID: @.***>

costinm avatar Sep 06 '22 15:09 costinm

@ramaraochavali You can provide a demo usage to be more clear

hzxuzhonghu avatar Sep 07 '22 01:09 hzxuzhonghu

@costinm One question on this

persistentSession:
  cookie: JESSIONID
  header: foo

Will there be ever a case of supporting stateful session persistence with more than one? If not this may be better. Otherwise what you suggested my be better.

persistentSession:

  • cookie:
    • name: JSESSIONID duration: 1h
  • header:
    • name: foo

ramaraochavali avatar Sep 07 '22 03:09 ramaraochavali

Based on my tomcat experience - I can almost guarantee cookie-only is not sufficient, and certainly not single-value plain-text/base64 IP address. We keep thinking of this as a direct service-to-single-service API - but gateways are a far bigger user of persistent sessions. It is a good start - but if you look at other load balancers supporting persistent sessions it's more refined.

I think we should keep the API as flexible as possible - including default values, i.e. most users should just say:

persistentSession: {}

Since persistent sessions may need to span multiple components ( in particular when exposed on ingress gateway ) it is far better for most users ( persona == namespace admin/app owner) to not mess with how persistentSession is handled. The actual infrastructure may also include external (non-Envoy) LBs and backends implemented in Java that may have their own mechanisms (HttpSession) - and it may be better for cluster admin who is aware of all infra to fine tune this.

We can keep the cookie name/header - users may want to override, but real persistent session across multiple sticky backends ( i.e. the common case when a browser is accessing via ingress) may not interact well with overrides.

How about keeping it as simple as possible - we can add header support on a separate PR, just keep

persistentSession:

  • cookie: name: foo maxAge: 1h # using the cookie terminology to be more clear it's not session duration

If you think header is important - we can also add it, but with docs that in current envoy impl only one of header and cookie can be used ( ingress gateways other than envoy may support both, proxyless gRPC too).

And I would add a MeshConfig "defaultPersistentCookie" section for mesh admins to set the defaults, and include a default value in the code, so most user can just declare intent to use persistentCookie: {} without worry about the details. I would not use JSESSIONID as default because it is typically an encrypted or opaque blob.

costinm avatar Sep 07 '22 15:09 costinm

BTW - https://docs.nginx.com/nginx/admin-guide/load-balancer/http-load-balancer/ has nginx config - https://kubernetes.github.io/ingress-nginx/examples/affinity/cookie/ the equivalent API when nginx is used as ingress, https://www.haproxy.com/blog/load-balancing-affinity-persistence-sticky-sessions-what-you-need-to-know/ and https://doc.traefik.io/traefik/routing/services/#sticky-sessions for others.

And my favorite: https://tomcat.apache.org/tomcat-8.5-doc/config/manager.html :-)

costinm avatar Sep 07 '22 16:09 costinm

@va7ish you might be interested in this API and provide feedback.

sanjaypujare avatar Sep 07 '22 16:09 sanjaypujare

How about keeping it as simple as possible - we can add header support on a separate PR, just keep

persistentSession:

cookie: name: foo maxAge: 1h # using the cookie terminology to be more clear it's not session duration

I agree with this. Changed. PTAL

ramaraochavali avatar Sep 08 '22 10:09 ramaraochavali

/test release-notes

ramaraochavali avatar Sep 08 '22 10:09 ramaraochavali

@howardjohn IMO it may be a good idea to not add this feature in DestinationRule, but to a new 'policy' CR - unlike many other things in DR, persistent session is not really 'host based', a host may route to multiple clusters.

I agree with @costinm. It seems that if this is done at the DR level it will break the canary functionality. We need to maintain affinity across clusters. Not sure if Envoy supports that in the currently.

va7ish avatar Sep 08 '22 23:09 va7ish

@howardjohn IMO it may be a good idea to not add this feature in DestinationRule, but to a new 'policy' CR - unlike many other things in DR, persistent session is not really 'host based', a host may route to multiple clusters.

I agree with @costinm. It seems that if this is done at the DR level it will break the canary functionality. We need to maintain affinity across clusters. Not sure if Envoy supports that in the currently.

I don't think it breaks canary - anything that has persistent session should keep going to the sticky backend, canary will apply to new sessions.

And we can define a new 'policy' - possibly for the Gateway/GAMMA API - in addition to adding it in DR, we should not block on that.

costinm avatar Sep 09 '22 04:09 costinm

@howardjohn Any concerns with this?

ramaraochavali avatar Sep 12 '22 10:09 ramaraochavali

Isn't it kind of broken without path...?

For example, if we have 50/50% weighting to v1 and v2

  1. Randomly hit v1 cluster, I get cookie for 1.2.3.4 (part of v1)
  2. Randomly hit v2 cluster. There is no 1.2.3.4, so I pick a random host and cookie is set to 2.3.4.5 (part of v2)
  3. Randomly hit v1. There is no cookie for 2.3.4.5, so I pick a random host...
  4. Repeat

Yes, it is pretty broken in other ways as well - as implementation in envoy. The API however seems fine, and the impl can be fixed in time.

For example the cookie can encode a list (cluster1=IP1,cluster2=IP2), or it can be combined with external authz to support encrypted cookies or cookies as IDs and lookup appropriate backend dynamically, for example in cases where we authenticate and send the session to a server where user data is stored.

I think ext authz is great for 'composing' with the persistent session, I believe it can inject or modify headers including cookie and support most of the advanced use cases.

costinm avatar Sep 12 '22 20:09 costinm

IMO we should have an idea of a non-broken Envoy implementation before moving forward with an API

howardjohn avatar Sep 12 '22 20:09 howardjohn

Isn't it kind of broken without path...?

For example, if we have 50/50% weighting to v1 and v2

  1. Randomly hit v1 cluster, I get cookie for 1.2.3.4 (part of v1)
  2. Randomly hit v2 cluster. There is no 1.2.3.4, so I pick a random host and cookie is set to 2.3.4.5 (part of v2)
  3. Randomly hit v1. There is no cookie for 2.3.4.5, so I pick a random host...
  4. Repeat

Not sure I understand "hit the v2 cluster" part. If a cookie is present then the RPC is sent to that IP address (skipping cluster selection). This is how both Envoy and proxyless gRPC will behave

sanjaypujare avatar Sep 12 '22 21:09 sanjaypujare

Isn't it kind of broken without path...? For example, if we have 50/50% weighting to v1 and v2

  1. Randomly hit v1 cluster, I get cookie for 1.2.3.4 (part of v1)
  2. Randomly hit v2 cluster. There is no 1.2.3.4, so I pick a random host and cookie is set to 2.3.4.5 (part of v2)
  3. Randomly hit v1. There is no cookie for 2.3.4.5, so I pick a random host...
  4. Repeat

Not sure I understand "hit the v2 cluster" part. If a cookie is present then the RPC is sent to that IP address (skipping cluster selection). This is how both Envoy and proxyless gRPC will behave

How can it skip cluster selection? That has more than IP - has TLS settings, etc

howardjohn avatar Sep 12 '22 22:09 howardjohn

...

Not sure I understand "hit the v2 cluster" part. If a cookie is present then the RPC is sent to that IP address (skipping cluster selection). This is how both Envoy and proxyless gRPC will behave

How can it skip cluster selection? That has more than IP - has TLS settings, etc

This is the design in Envoy https://docs.google.com/document/d/1IU4b76AgOXijNa4sew1gfBfSiOMbZNiEt5Dhis8QpYg/edit#heading=h.x3i1k73a3v8z

"When a new request arrives, the router filter tries to resolve the upstream host address from the cookie. The Load Balancer gets this address through the LoadBalancerContext (requires extending the LoadBalancerContext API) and prefers this address."

That means it skips cluster selection? That is the whole point of cookie based stateful session affinity.

sanjaypujare avatar Sep 12 '22 22:09 sanjaypujare

AFAIK that is to pick the host within the cluster after picking a cluster

howardjohn avatar Sep 12 '22 22:09 howardjohn