memcached-session-manager icon indicating copy to clipboard operation
memcached-session-manager copied to clipboard

non-sticky sessions: issue with super fast requests

Open ghost opened this issue 10 years ago • 24 comments

From [email protected] on January 23, 2015 16:40:43

What steps will reproduce the problem?

  1. sessionBackupAsync="false"
  2. sessionBackupTimeout=100
  3. Set a value in the session
  4. Issue a 302 redirect to the user to send them to a different page
  5. Browser returns to the site and the state (using Spring Webflow) is invalid. This issue only happens when the user returns to the site very quickly, usually only with a 302 redirect. If we put a 1s sleep/delay before redirecting the user then this issue goes away. My assumptions is that the session would be written to memcached BEFORE the response is returned to the user. It appears that the session data is being written to memcached sometime AFTER the response is returned to the user. Any insights on this? Just to be silly, I'll do a $20 bounty via PayPal if someone can solve this. We are running about 60 machines in Amazon EC2 and this is a really nice sized environment. This project is really giving us great things like zero downtime deploy and auto-scaling without sticky sessions. This is the only issue we've had thus far.

Original issue: http://code.google.com/p/memcached-session-manager/issues/detail?id=224

ghost avatar Aug 24 '15 08:08 ghost

From martin.grotzke on January 23, 2015 14:07:27

With your configuration (async false) the session should be stored in memcached before the response is returned. Can you reproduce this issue with debug logging enabled (see https://code.google.com/p/memcached-session-manager/wiki/SetupAndConfiguration#Configure_logging) and share logs from the involved tomcats?

ghost avatar Aug 24 '15 08:08 ghost

From [email protected] on February 04, 2015 14:00:37

Disclosure: I know the original poster and am a bit closer to the issue

I have created a project on github that is able to demonstrate the issue. https://github.com/tyrinslys/spring-webflow-memcache Running the example if all things were in working order I would expect to get a heap space error. Instead I get an error from the flow controller that the specific snapshot can not be found. Caused by: org.springframework.webflow.execution.repository.snapshot.SnapshotNotFoundException: No flow execution snapshot could be found with id '2583'; perhaps the snapshot has been removed? It seems to happen at random. As a summary here is what this bug demo webapp is doing:

  1. creating a map and placing it in session (sessionMap)
  2. grab sessionMap and add one Integer pair to it (not a replace but mutating the map in session)
  3. starting the flow
  4. setting a counter in flow scope
  5. issuing a redirect using the redirect directive (this actually sends a 302)
  6. browser then requests the new url
  7. same as step 2
  8. add 1 to the counter in scope (then log it)
  9. return a vew that has a meta redirect
  10. browser calls server again
  11. loop back to step 2 only we are not starting another flow... but continuing the existing flow. I'll update if I find a certain version mix to fix the issue.

ghost avatar Aug 24 '15 08:08 ghost

From [email protected] on February 05, 2015 06:01:34

So... ignore me for a while. It appears I had settings incorrect and the example project doesn't illustrate the issue. I'll get back

ghost avatar Aug 24 '15 08:08 ghost

From martin.grotzke on February 05, 2015 06:41:05

Ok.

ghost avatar Aug 24 '15 08:08 ghost

From [email protected] on February 09, 2015 12:02:04

First I want to make sure my assumption is correct, please confirm the following.

If sessionBackupAsync="false" is in the config then the system behavior should be the same as using the default tomcat session manager (when looking from the point of view of the browser).

If that is not the case then please provide more information.

So an issue I was able to bring out with my example project is that when I run more than one flow at a time the conversation is lost.

Summary: The problem is that no more than 1 flow is able to run at a time, without error. Caused by: org.springframework.webflow.execution.repository.snapshot.SnapshotNotFoundException: No flow execution snapshot could be found with id '58'; perhaps the snapshot has been removed?

Running Tomcat without memcache allows the max configured webflows to run without issue. Once past that we get a NoSuchConversationException which is expected as the first created webflow is deleted from session. Caused by: org.springframework.webflow.conversation.NoSuchConversationException: No conversation could be found with id '1' -- perhaps this conversation has ended?

To recreate this run the following link in multiple tabs with the example project. http://localhost:8080/spring-mvc-showcase/flow/testFlow?debug=grow

Oh and here is version information: apache-tomcat-7.0.57 memcached-session-manager-1.8.2.jar memcached-session-manager-tc7-1.8.2.jar spymemcached-2.11.1.jar

ghost avatar Aug 24 '15 08:08 ghost

From [email protected] on February 10, 2015 08:19:12

I have even tried lockingMode="all" which I think would make sure simultaneous threads are not causing issues, but the issue is still found.

ghost avatar Aug 24 '15 08:08 ghost

From martin.grotzke on February 10, 2015 13:46:33

Re sessionBackupAsync="false": yes, the system behavior should be the same as using the default tomcat session manager (when looking from the point of view of the browser).

I checked out your project and ran http://localhost:8080/spring-mvc-showcase/flow/testFlow?debug=grow which is then counting and counting (I stopped at "Count now is 712").

What can I do to reproduce your issue?

Btw, great that you've setup the sample project!

Cheers, Martin

ghost avatar Aug 24 '15 08:08 ghost

From [email protected] on February 10, 2015 14:30:26

Looks like you got the project setup. To test you run that url in 2 tabs so the same session is being edited by both tabs.

I took the liberty of adding the lockingMode="all" to the context file in the project as I thought that would solve the issue.

I believe the problem shows itself when 2 requests for memcache-session happen at the same (similar) time.

Again with memcache configured in tomcat with the provided context the example does not work... and removing it, which uses tomcat session management, the issue magically is gone.

Please remember the context file is not automatically deployed but must be copied to the tomcat home config dir and a tomcat restart issued.

Let me know if you still have issues, and I will be happy to assist.

ghost avatar Aug 24 '15 08:08 ghost

From martin.grotzke on February 11, 2015 01:03:54

Ok, this way I can reproduce the issue. One important thing is that you're using non-sticky sessions, which means that a session is only stored in the tomcat internal session map for the duration of a request. I'd say that using lockingMode="all" should solve concurrency issues, I need to think about it what might be the issue. Perhaps there's also an issue with web flow serialization, I'd need to understand the internals of webflow and how things are serialized/restored. Finally I'd say it doesn't seem to be easy to be solved but needs some deeper investigation.

Do you need non-sticky sessions, or could you just use sticky sessions?

ghost avatar Aug 24 '15 08:08 ghost

Thank you. I was having very similar problems with access to session attributes randomly returning null values. It is an AJAX-heavy application and the problem used to usually surface whenever a read was performed too soon after a write. I use useBackupAsync = false and sticky = false too.

Setting lockingMode = 'all' is the only thing that seems to have resolved the issue.

radarsh avatar Mar 04 '16 09:03 radarsh

@radarsh Do you think there's an issue we should analyze? Then we should create a sample app that allows to reproduce it. Otherwise I'd close this issue.

magro avatar Mar 06 '16 13:03 magro

@magro unfortunately, setting the locking mode doesn't appear to have fixed the problem completely. I still see intermittent errors.

Upon further analysis, I found that MSM is timing out trying to lock the session.

Reached timeout when trying to acquire [sic] lock for session XXXXXX. Will use this session without this lock.

It looks like the default timeout is the same as the operation timeout for Memcached (1 second). I'm not sure what's taking so long to acquire the lock here. I am going to try increasing the operation timeout for now but still feel that's not the best thing to do.

What do you suggest?

Also I'm curious as to why you are using a memcached operation to act as a lock rather than a Java based lock?

radarsh avatar Mar 07 '16 10:03 radarsh

@radarsh We (I) need a reproducable sample to be able to analyze this. Can you create such a sample app, that's as minimum as possible but still allows to reproduce the issue? Check out https://github.com/magro/memcached-session-manager/tree/master/samples as a starting point.

Re memcached lock vs. java lock: because for non-sticky sessions (parallel) requests may hit different tomcats. Therefore a java lock would not help, because both requests would run in parallel in different tomcats and the one that writes the session last would win and would override the data written from the other request.

magro avatar Mar 07 '16 21:03 magro

@radarsh I just added a sample (in commit ec068d9f) that automatically counts a number up to a given max. Starting from 0 for each request it adds 1 and saves this value to the session. It then redirects to page by passing the saved number as request parameter, and on each request it compares the value from the request parameter with the value read from the session. If they differ it aborts counting.

This is my simulation of fast requests, and it works for me (have tested with to=1000): I started two tomcats on ports 9090/9091 (in the samples directory with

mvn tomcat7:run -pl :simpleservlet -am -Pnon-sticky -Dmsm.failoverNodes=" " -Djava.util.logging.config.file=src/main/resources/logging.properties -Dtomcat.port=9090/1

) and ran nginx with round robin load balancing (I used the docker image as described in the samples README. Then I pointed my browser to http://localhost/autocounter?to=1000 and watched it counting...

magro avatar Mar 07 '16 22:03 magro

Pushed the wrong button ;-)

magro avatar Mar 07 '16 22:03 magro

Ok I will try and come up with something that reproduces this issue. Meanwhile, if it helps, I am using embedded Tomcat 8 in a Grails / Spring Boot set up with Spring's session scoped beans.

On Monday, 7 March 2016, Martin Grotzke [email protected] wrote:

Pushed the wrong button ;-)

— Reply to this email directly or view it on GitHub https://github.com/magro/memcached-session-manager/issues/263#issuecomment-193492054 .

radarsh avatar Mar 07 '16 23:03 radarsh

Any news on this?

magro avatar Jun 06 '16 21:06 magro

I'm afraid not. Project deadlines forced me to write a custom implementation using Spring Session which seems to be doing the job.

On Monday, 6 June 2016, Martin Grotzke [email protected] wrote:

Any news on this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/magro/memcached-session-manager/issues/263#issuecomment-224091007, or mute the thread https://github.com/notifications/unsubscribe/AAgn8KNBsPUNRtMJh0QXQFHhIPsrVbHEks5qJI4xgaJpZM4Fw0P- .

radarsh avatar Jun 07 '16 05:06 radarsh

Ok

magro avatar Jun 07 '16 06:06 magro

Because tomcat start to store the session after the request is finished. during this, May be the response has been received by the browser and the follow requests has been processed by tomcat. However ,now, the session store not finished.

guojjanjun avatar Dec 26 '18 09:12 guojjanjun

@guojjanjun as written above with sessionBackupAsync="false" the response should be sent to the browser after the session was stored in memcached. Do you have evidence that this is not the case?

magro avatar Dec 26 '18 10:12 magro

@magro I think so. But maybe most contents of response has be send to browser(such as flush is invoked in servlet.service)。Maybe the data impact the cookies in the brower. Once I use loadrunner to test my application which session was stored in redis, when generate session id , immediately I store the session(id: id, data: 'NULL') to redis to check if id conflict, then set the current session to replace it when request finish. However. I found some request get the session which data is 'NULL'。

guojjanjun avatar Dec 28 '18 05:12 guojjanjun

@guojjanjun I think we need a reproducer for this. What I've tried so far (as written above) did not achieve this. Maybe you could try to provide a reproducable sample? You could use the sample in https://github.com/magro/memcached-session-manager/tree/master/samples as a basis.

magro avatar Jan 01 '19 21:01 magro

A way to reproduce this might be to change this AutoCounterServlet to use a chunked encoding, flush the response buffer after writing the html head (triggering the "redirect"), and sleeping for a while before completing the request.

magro avatar Jan 02 '19 00:01 magro