serving
serving copied to clipboard
Session affinity / sticky sessions / cookie based traffic splitting
Ask your question here:
Hello, wonderful Knative community! Thanks for all your effort, I'm a new user who's fascinated by the project you've created. However, I could not find answer to the question: is it possible to use cookies when splitting traffic to be sure that the same user will always be directed to the same revision? I know this feature exists in bare Istio, Gloo, ..., but could not find any info how to make use of it in Knative, and whether it is even possible or not. Thanks!
(i wasnt sure which area to chose, so i left it blank, sorry for the inconvenience)
/cc @tcnghia @ZhiminXiang @nak3
@des-esseintes currently Knative does not support session affinity / sticky sessions/ cookie based routing. To workaround this, you could try to add your own resources (e.g. VirtualService for Istio) to route traffic to revision services based on your requirements.
i see, thanks, @ZhiminXiang! any plans to make this 'native' to Knative? :) just wondering.
I have the same question, because I just got the same question :)
I think it is a very important use case to allow revision stickiness : Stick to a revision once selected when doing a traffic split based on probability.
Tag header-based routing could be half of the solution. If now an HTTP response from a revision would now include its revision-tag, then a client could perfectly do the correlation on its own:
- Send first requests to the service URL without any extra info
- Traffic split selects a revision by distribution
- HTTP response from this revision contains the tag as specified in the traffic split
- Client picks up that response header field and sends it back on all future requests.
That way as soon as a revision has been selected, the client is able to stick to this revision.
That is an interesting proposal @rhuss. Adding a response header is probably easy, but it does require expanding our runtime contract.
cc @mattmoor @evankanderson @dprotaso
I was under the impression tag header based routing only supports 'tags' that are explicitly marked in the Route.
This is similar to: https://github.com/knative/serving/issues/9039
Cross post my comment from #9039
Given the failure modes I don't understand why you wouldn't want to move your session state persistence to some external service - ie. memcache, redis, apache gemfire
ie. if you're using spring there's tooling that abstracts this https://spring.io/projects/spring-session-data-redis https://spring.io/projects/spring-session-data-geode
@des-esseintes can you clarify why you want to hit a specific revision. I'm assuming it has some in-memory session data.
hey @dprotaso well for me it seemed quite obvious, but i might be wrong here. image we have a web app, I, as a user, open it in a browser and hit versin 2 (new one), then I click a link and get to version 1 because of the randomness. then press 'back', but do not see the same page as version 1 is showing up again. etc etc
If you made small changes to your v1 app would it pushed as a v3?
I'm trying to determine the the lifecycle of the web apps.
@dprotaso hmmmm i dont think i understandwhat you mean... my example was made up, kind of, but i do believe this feature would be handy in some cases. for example (a different one): https://cloud.google.com/appengine/docs/flexible/python/using-websockets-and-session-affinity#session_affinity
I've seen this request before and it's not unreasonable, but some of the outcomes are surprising no matter which design decision is chosen at this infrastructure level.
The general idea is that (for example) you might be serving JavaScript and HTML resource bundles that are matched, and you don't want to serve v237 javascript to a v238 client, or vice-versa.
-
One way to handle this would be to provide the application with a hint as to any tags that are mapped to that revision, and then allow the app to return information to the client to allow subsequent fetches to map to the same tag. This has the disadvantage that it's possible for actual traffic splits to move substantially from the requested amount, if clients hold onto old tags for an extended period of time. (The % splits will only apply to new requests for an application which is doing this.)
-
Another option would be to smuggle a "version selection hash" to the server (as a header which could be sent back to the client) which could be re-presented to ensure a consistent % hash allocation as the traffic percentage assignments change. The disadvantage here is that it's still possible to get the "broken bundle" problem when the traffic assignments change.
-
A third option is to use a consistent hash on request properties (client IP, possibly a known cookie set by the routing layer, or some other header) which is used by the routing layer to determine the % hash allocation (as in 2). The difference with option 2 is that this method is automatic, rather than requiring the server to opt in by repeating some value back to the client.
Unfortunately, there's a tension between stickiness and matching the requested traffic assignment percentages, particularly during a rollout. It would be worth comparing the options for real-world applications and making a recommendation using the Feature Tracks process (it can be a short doc, but I'd focus on the tradeoffs and why choosing a particular one is best).
One additional consideration is that doing more careful assignment of requests to particular buckets may be limited on some of the network routing backends, so it's probably worth talking to the networking WG about what functionality can be enabled.
/assign @nak3 @ZhiminXiang
@dprotaso it's not really about state but that you alway hit the same version of your app once it has been selected with the first request. Think about canary releases where you want a user that hits the canary should stay with this version. Regardless of state, hitting two different versions of your (web) app during a user interaction is definitely not what you want.
Of course the client needs to decide what the first request is, and then send back a revision tag that is honoured by the router to hit the same revision (not same pod of course) as long as the revision tag is present in the header. The revision tag is picked up from the response header ideally (imo). Also, the client decides when that "session" is over (e.g. when the user logs out).
Of course this requires active support by the app itself (i.e. picking up the revision tag from the http response), but this is an easy way how you can achieve "revision stickiness" (that is distributed over multiple pods maybe).
I understand that there is a conflict between matching the traffic split rules and client selected revision stickiness, but maybe the routing algorithm could take into account user pinned revision by counting those requests and distribute only 'fresh' requests according to the rules ? (saying that while not really knowing how the distribution works)
It's tricky indeed.
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.
I think this issue is still important /remove-lifecycle stale
It sounded in https://github.com/knative/serving/issues/8160#issuecomment-664954239 like we might have a solution based on header-based tag routing:
Tag header-based routing could be half of the solution. If now an HTTP response from a revision would now include its revision-tag, then a client could perfectly do the correlation on its own:
- Send first requests to the service URL without any extra info
- Traffic split selects a revision by distribution
- HTTP response from this revision contains the tag as specified in the traffic split
- Client picks up that response header field and sends it back on all future requests.
That way as soon as a revision has been selected, the client is able to stick to this revision.
Is this a matter of documenting this pattern at this point?
/kind documentation /kind enhancement /triage accepted
/good-first-issue
@evankanderson: This request has been marked as suitable for new contributors.
Please ensure the request meets the requirements listed here.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.
In response to this:
It sounded in https://github.com/knative/serving/issues/8160#issuecomment-664954239 like we might have a solution based on header-based tag routing:
Tag header-based routing could be half of the solution. If now an HTTP response from a revision would now include its revision-tag, then a client could perfectly do the correlation on its own:
- Send first requests to the service URL without any extra info
- Traffic split selects a revision by distribution
- HTTP response from this revision contains the tag as specified in the traffic split
- Client picks up that response header field and sends it back on all future requests.
That way as soon as a revision has been selected, the client is able to stick to this revision.
Is this a matter of documenting this pattern at this point?
/kind documentation /kind enhancement /triage accepted
/good-first-issue
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Yep, I think having this documented and maybe having a sample that shows how a simple javascript based web-client could leverage this technique would be very helpful.
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.
/remove-lifecycle stale
@evankanderson @rhuss is documentation still required for this? Would it be worth opening an issue in the docs repo and closing / linking to this one? Any idea who could provide information for this to base docs on, or can we get a volunteer SME to work on doc drafts?
I think a blog post and maybe some documentation updates would be appropriate here; this is a a matter of using some of the existing features creatively and probably isn't obvious to most users.
I don't know if we have a category / queue for "technical blog post topics".
+1 for a blog post at least, but that should include a simple example of such a client, like an expressjs app, that does this roundtrip. So it's more than just writing but also contains some coding ahead. Maybe @csantanapr could help us here ?
@rhuss @evankanderson @dprotaso I find this issue quite interesting.. is there any documentation that I can look at where people documented some of the use cases that we want to target? I do see this as very relevant for functions as well.
We should dedup this with #9039 to consolidate.
@salaboy I'm afraid we haven't really documented the use case but one is straightforward: If you are doing a canary deployment, where you route 95% of your users to the existing version and 5% to a new (canary) version to check out how the user reacts, then you want that during a user's session (that spans multiple subsequent HTTP requests) all HTTP requests hit the same version as it has been initially selected, especially if both versions are not compatible to each other. Actually "revision stickiness" is needed for all applications where multiple-requests are used for user interaction.
The question would be where we want to document how the tag header-based routing can be leveraged to achieve this revision stickiness.
/unassign @tcnghia @ZhiminXiang @nak3 /triage needs-user-input
Following up it would be good to get user input on:
- can Tag Header Based Routing to address the 'hit the same version of my app' problem
- how do people usually handle client (SPA)/server drift - ie. most things break but I could imagine a re-direct to trigger a page reload
- Do you need to signal stickiness via some different mechanism (ie. cookie)
Given the concerns mentioned in https://github.com/knative/serving/issues/9039#issuecomment-677848875 I think I would want to keep things sane and just support stickiness to Revisions (and not instances/Pods)