Harbour.RedisSessionStateStore icon indicating copy to clipboard operation
Harbour.RedisSessionStateStore copied to clipboard

Site hangs

Open wliao008 opened this issue 11 years ago • 29 comments

Hi, I'm having a strange problem after putting the RedisSessionStateStoreProvider in place, occasionally my site would hangs, and if i go on the server and issue a "monitor" command to my redis server, here's a portion of the output:

"watch" "bc/4i0sekkgl3nhis33ddooqyds" "hgetall" "bc/4i0sekkgl3nhis33ddooqyds" "unwatch" "watch" "bc/4i0sekkgl3nhis33ddooqyds" "hgetall" "bc/4i0sekkgl3nhis33ddooqyds" "unwatch" "watch" "bc/4i0sekkgl3nhis33ddooqyds" "hgetall" "bc/4i0sekkgl3nhis33ddooqyds" "unwatch" "watch" "bc/4i0sekkgl3nhis33ddooqyds" "hgetall" "bc/4i0sekkgl3nhis33ddooqyds" "unwatch" "watch" "bc/4i0sekkgl3nhis33ddooqyds" "hgetall" "bc/4i0sekkgl3nhis33ddooqyds" "unwatch" ... ...

and it would just keeps going on and on..

doing a hget bc/4i0sekkgl3nhis33ddooqyds locked shows that this key is locked, which I assume is what is causing it to hang?

I run into the same problem when doing load testing with JMeter.

I used the code as is without much modification. I have a master-slave setup (redis v2.8.3). servicestack.redis v3.9.60.

anyone seen this?

Thanks! -wl

wliao008 avatar Jan 28 '14 20:01 wliao008

Hey,

The key prefix that is in the output of the watch doesn't match the default. You can see in the Initialize() method that AspNetSession is the default prefix for Redis keys.

Did you change the key setting to bc somewhere in your configuration?

One thing to note is that ASP.NET will block concurrent requests to the same session. Can you try using JMeter with multiple different sessions?

I can help more if you can post a simple ASP.NET sample somewhere that you can hit with JMeter and I'll try to reproduce it. Thanks!

TheCloudlessSky avatar Jan 28 '14 21:01 TheCloudlessSky

Hi, yes I did changed the prefix in my webconfig.

<sessionState mode="Custom" cookieless="false" timeout="30" customProvider="bc">
    <providers>
        <clear />
        <add name="bc" type="..." />
    </providers>
</sessionState>

anyway, yes we're using different sessions in jmeter (we have a http cookie manager set up), and i can see the different sessions listed in redis as well.

What concerns me is that even when testing locally and manually, with only me accessing the site, occasionally i still ran into the same problem.

I'll see if I can get something setup.

Thanks! -wl

wliao008 avatar Jan 28 '14 22:01 wliao008

Has this issue been played out? We are running into threads blocking because a response is not coming back from the session state provider. Due to a bug this was locking the whole site. The bug has been fixed but I'm still needing to look into the hung requests :| I'm curious if the issue is with the newest Redis client.. It's quite a few versions ahead of 2 years ago. Is there a particular, tested version of ServiceStack.Redis to try?

ProTip avatar Feb 10 '14 21:02 ProTip

I was pulled off to work on another area so I haven't been able to resolve this. As you said, I should probably try again with ServiceStack 3.8.3.

wliao008 avatar Feb 10 '14 21:02 wliao008

Hmm, the oldest Nuget package available through the package explorer is 3.9.29 . You wouldn't happen to know how to get a hold of 3.8.3 without building from source?

ProTip avatar Feb 10 '14 21:02 ProTip

All the servicestack dlls are in the lib folder of this repo. Good luck, please let me know how it goes :)

wliao008 avatar Feb 10 '14 21:02 wliao008

If either of you can create a reproducable case I'd like to help fix this issue. We don't use the ASP.NET session anymore, but when we were using it we had a significant load without any lockups (or none that we noticed). Right now, we still use SS.Redis v3.9.29 with other areas of our app without any issues.

TheCloudlessSky avatar Feb 10 '14 21:02 TheCloudlessSky

Still getting the hangs. Once the threads lock up all the traffic turns into this:

1392072383.569702 "HGETALL" "InnerProvider/vaq43jyhiru0sef0i2z3kost" 1392072383.571567 "UNWATCH" 1392072383.573037 "WATCH" "InnerProvider/juns3nwigyegjpyhbqd4g4ch" 1392072383.574505 "HGETALL" "InnerProvider/juns3nwigyegjpyhbqd4g4ch" 1392072383.598808 "UNWATCH" 1392072383.600329 "WATCH" "InnerProvider/lab5g32aoye1iprk5mjzpx5k" 1392072383.601793 "HGETALL" "InnerProvider/lab5g32aoye1iprk5mjzpx5k" 1392072383.603676 "UNWATCH" 1392072383.605130 "WATCH" "InnerProvider/lahiw1xdvhxqino5iwbslbad" 1392072383.606576 "HGETALL" "InnerProvider/lahiw1xdvhxqino5iwbslbad" 1392072383.608380 "UNWATCH"

I may need to build the project in debug mode to get a good stack trace.

ProTip avatar Feb 10 '14 22:02 ProTip

@ProTip Can you reproduce locally with JMeter or other load testing software? Please provide a stack trace if you can!

TheCloudlessSky avatar Feb 10 '14 23:02 TheCloudlessSky

The analysis is a lot different with the debug build. I have a lot of threads waiting for information from the server. This particular thread is also holding a lock which is blocking other threads(this has supposedly been fixed):

Thread 118 - System ID 2228

Entry point   clr!Thread::intermediateThreadProc 
Create time   2/10/2014 11:12:37 PM 
Time spent in user mode   0 Days 00:00:00.780 
Time spent in kernel mode   0 Days 00:00:00.046 

This thread is waiting on data to be returned from another server via WinSock.

 The call to WinSock originated from System_ni!DomainNeutralILStubClass.IL_STUB_PInvoke(IntPtr, Byte*, Int32, System.Net.Sockets.SocketFlags)+7f

.NET Call Stack

Function

System_ni!DomainNeutralILStubClass.IL_STUB_PInvoke(IntPtr, Byte*, Int32, System.Net.Sockets.SocketFlags)+7f 
[[InlinedCallFrame] (System.Net.UnsafeNclNativeMethods+OSSOCK.recv)] System.Net.UnsafeNclNativeMethods+OSSOCK.recv(IntPtr, Byte*, Int32, System.Net.Sockets.SocketFlags) 
System_ni!System.Net.Sockets.Socket.Receive(Byte[], Int32, Int32, System.Net.Sockets.SocketFlags, System.Net.Sockets.SocketError ByRef)+fa 
System_ni!System.Net.Sockets.NetworkStream.Read(Byte[], Int32, Int32)+c8 
mscorlib_ni!System.IO.BufferedStream.ReadByte()+67 
ServiceStack.Redis.RedisNativeClient.ReadLine()+58 
ServiceStack.Redis.RedisNativeClient.ReadData()+1a 
ServiceStack.Redis.RedisNativeClient.get_Info()+6c 
ServiceStack.Redis.RedisNativeClient.Watch(System.String[])+26 
Harbour.RedisSessionStateStore.RedisSessionStateStoreProvider.GetClientAndWatch(System.String)+dc 
Harbour.RedisSessionStateStore.RedisSessionStateStoreProvider.GetItem(Boolean, System.Web.HttpContext, System.String, Boolean ByRef, System.TimeSpan ByRef, System.Object ByRef, System.Web.SessionState.SessionStateActions ByRef)+146 
Harbour.RedisSessionStateStore.RedisSessionStateStoreProvider.GetItemExclusive(System.Web.HttpContext, System.String, Boolean ByRef, System.TimeSpan ByRef, System.Object ByRef, System.Web.SessionState.SessionStateActions ByRef)+7e 

This one has been active much longer:

Thread 45 - System ID 4540

Entry point   clr!Thread::intermediateThreadProc 
Create time   2/10/2014 11:08:00 PM 
Time spent in user mode   0 Days 00:00:18.189 
Time spent in kernel mode   0 Days 00:00:00.717 

This thread is waiting on data to be returned from another server via WinSock.

 The call to WinSock originated from System_ni!DomainNeutralILStubClass.IL_STUB_PInvoke(IntPtr, Byte*, Int32, System.Net.Sockets.SocketFlags)+7f

.NET Call Stack

Function

System_ni!DomainNeutralILStubClass.IL_STUB_PInvoke(IntPtr, Byte*, Int32, System.Net.Sockets.SocketFlags)+7f 
[[InlinedCallFrame] (System.Net.UnsafeNclNativeMethods+OSSOCK.recv)] System.Net.UnsafeNclNativeMethods+OSSOCK.recv(IntPtr, Byte*, Int32, System.Net.Sockets.SocketFlags) 
System_ni!System.Net.Sockets.Socket.Receive(Byte[], Int32, Int32, System.Net.Sockets.SocketFlags, System.Net.Sockets.SocketError ByRef)+fa 
System_ni!System.Net.Sockets.NetworkStream.Read(Byte[], Int32, Int32)+c8 
mscorlib_ni!System.IO.BufferedStream.ReadByte()+67 
ServiceStack.Redis.RedisNativeClient.ReadLine()+58 
ServiceStack.Redis.RedisNativeClient.ReadData()+1a 
ServiceStack.Redis.RedisNativeClient.get_Info()+6c 
ServiceStack.Redis.RedisNativeClient.Watch(System.String[])+26 
Harbour.RedisSessionStateStore.RedisSessionStateStoreProvider.GetClientAndWatch(System.String)+dc 

ProTip avatar Feb 10 '14 23:02 ProTip

Are you saying the first issue was fixed in SS.Redis?

TheCloudlessSky avatar Feb 11 '14 00:02 TheCloudlessSky

Well, my application is hanging and I'm seeing similar output to the OP. Every time the application hangs most of my threads are waiting on data from Redis..

I was getting this issue with the latest compatible SS.Redis, now I'm using a debug build of the project that uses the bundled SS.Redis version.

ProTip avatar Feb 11 '14 02:02 ProTip

@TheCloudlessSky, what ver of redis are you using in your setup. thanks.

wliao008 avatar Feb 11 '14 14:02 wliao008

Hi guys. I believe I have resolved my issue of the threads waiting for data from Redis. Apparently this application/client combination is creating about 5+ connections to the Redis server per thread. I was simulating 100 users so this was creating 500+(minimum) connections running right through the open file limit. I have raised the limit and no longer see the threads waiting for data from Redis.

ProTip avatar Feb 11 '14 21:02 ProTip

@wliao008 We're using 3.9.57. See the details below for our setup.

@ProTip That's quite a lot of connections per-thread :boom:! What type of Redis client manager are you using (pooled, basic)? Is it configured through the web.confg or via RedisSessionStateStoreProvider.SetClientManager()? Are you using an inversion of control container (Ninject, StructureMap, Windsor etc.)? If so, how's it configured?

Here's our setup:

  • Ninject as our IoC container
  • Use the PooledRedisClientManager and have the IRedisClient as a singleton per-request (because we use Redis in other areas of our app):
var pooledClientManager = new PooledRedisClientManager(redisConnection);

// IMPORTANT: Ninject's default Pipeline/ActivationCache only deactivates (disposes) 
// references once. And because the pool returns the same instances, Ninject does not 
// dispose of the IRedisClient the 2nd time its requested. This causes the client to never
// become deactivated. Therefore, we wrap the client with the DisposablePooledClient
// which always use a new reference for Ninject to manage and internally disposes of the
// IRedisClient.
// http://stackoverflow.com/questions/17047732/how-can-ninject-be-configured-to-always-deactivate-pooled-references

Bind<PooledRedisClientManager.DisposablePooledClient<RedisClient>>()
    .ToMethod(ctx => pooledClientManager.GetDisposableClient<RedisClient>())
    .InRequestScope();

Bind<IRedisClient, ICacheClient>()
    .ToMethod(ctx => ctx.Kernel.Get<PooledRedisClientManager.DisposablePooledClient<RedisClient>>().Client)
    .InTransientScope();

This setup ensures that any part of our application (other than the RedisSessionStateStore) only uses a single instance per-request.

We then configure the RedisSessionStateStore and use the pooled client manager reference directly:

RedisSessionStateStore.SetClientManager(pooledClientManager);

So, you shouldn't be creating a lot of connections to the Redis server per-thread if you're using the PooledRedisClientManager since it only instantiates a new IRedisClient if another one is in use (it'll put it back into the pool once it's done being used!).

If you haven't done already, switch to the pooled client manager. If you're using Redis some where else in your application, make sure that you're properly cleaning up the client so that it'll be put back into the pool.

TheCloudlessSky avatar Feb 11 '14 22:02 TheCloudlessSky

Thanks @TheCloudlessSky, that's good info! Regarding @ProTip's suggestion of increasing the file limit, on my default Redis instance, we have 3984 max clients, but we still experience the locked up issue load testing with 2 users. I did go back to the initial version of the session provider without the Watch/Unwatch changes, and it seems fine so far, loaded test 100 users without a single problem.

wliao008 avatar Feb 12 '14 15:02 wliao008

@wliao008 Hmm, that's weird that the watch/unwatch would be causing it. There's got to be a deeper bug within ServiceStack.Redis then. I don't use key watching in any of our other Redis-based code. For any atomic operation, we use a generational approach (like with NHibernate.Caches.Redis), Redis transactions with MULTI/EXEC (only useful for commands that don't return data and need that data mutated) or use a Lua script that has more logic when necessary (since they are atomic!). I'd like to use the Lua approach, but it'd depend if people are using Redis 2.6.0.

However, doing watch/unwatch shouldn't be causing any issues like I said. If so, it'd mean there's a bug in ServiceStack.Redis.

TheCloudlessSky avatar Feb 12 '14 20:02 TheCloudlessSky

Unfortunately I don't have direct control over the instantiation ATM. The software I'm deploying wraps the provider and passes the config through. I am specifying "pooled" for the clientType though.. But seeing 500 files for the Redis process under /proc makes me suspicious that it's not being used correctly in that regard with a 100 concurrent user simulation. I believe it may be creating 2 session keys per ASP.NET session though, so that could account for up to 200? Not sure how they get cleaned up after that..

I may try the pre-watch version as well just confirm whether or not that improves my situation.

ProTip avatar Feb 13 '14 11:02 ProTip

I was looking thru the msdn documentation on session provider yesterday trying to find out the steps of execution, and came across this, for the GetItemExclusive method:

If no session item data is found at the data store, the GetItemExclusive method sets the locked output parameter to false and returns null.

In the provider, there's:

private SessionStateStoreData GetItem(...)
{
    //code removed for brevity
    RedisSessionState state;
    if (!RedisSessionState.TryParse(stateRaw, out state))
    {
         client.UnWatch();
         return null;
    }
    //code removed for brevity
}

So we changed it to:

private SessionStateStoreData GetItem(...)
{
    //code removed for brevity
    RedisSessionState state;
    if (!RedisSessionState.TryParse(stateRaw, out state))
    {
         client.UnWatch();
         locked = false; // <- add this line
         return null;
    }
    //code removed for brevity
}

And believe it or not, this seems to fix the problem for us, I've been load testing it on and off with variable number of users and so far did not have any locking issue.

wliao008 avatar Feb 13 '14 14:02 wliao008

@ProTip Yes, please try the pre-watch version and see if it fixes anything. The problem could lie in ServiceStack.Redis.

@wliao008 That shouldn't have done anything different because this line sets locked = false. Are you running a release build?

TheCloudlessSky avatar Feb 13 '14 16:02 TheCloudlessSky

I just released v1.2.0 which hopefully fixes this problem. I noticed that there were placed where UNWATCH was not being called (because the transaction was not committed -- or even started). Update and give it a spin and please let me know if it fixes this issue.

TheCloudlessSky avatar Feb 16 '14 04:02 TheCloudlessSky

v1.3.0 released that uses a distributed lock rather than watch/unwatch. Please update and let me know if this fixes your issues. Also update ServiceStack.Common, ServiceStack.Redis, and ServiceStack.Text to 3.9.71 (The dependencies are now >= 3.9 && < 4.0).

TheCloudlessSky avatar Feb 18 '14 01:02 TheCloudlessSky

@TheCloudlessSky you were right about that locked=false! i was so hung up i didn't even noticed it was there already. I'll try out the new ver tomorrow, thanks!

wliao008 avatar Feb 18 '14 03:02 wliao008

Need to test further(much futher) but this appears to be much more stable for me.

ProTip avatar Feb 27 '14 01:02 ProTip

Awesome! If you can, try testing with JMeter, Apache Bench or another similar load tool.

TheCloudlessSky avatar Feb 27 '14 02:02 TheCloudlessSky

@TheCloudlessSky Hi, I'm testing with gatling and a simulation of 100 concurrent users. I'm very happy with the results so far. Max connections are around 60 now and I have not experienced hangs :)

ProTip avatar Feb 27 '14 02:02 ProTip

@wliao008 @ProTip Have you guys had any luck with this? is it stable now?

TheCloudlessSky avatar Apr 22 '14 17:04 TheCloudlessSky

@TheCloudlessSky, sorry for the late update. I did try to test out the newer version but we ended up using the older ver in our production. And I've been assigned to work on other stuffs since, so haven't been able to follow it up. As far as I'm concerned this issue is ok for me. Thank you!

wliao008 avatar Apr 22 '14 18:04 wliao008

Great, thanks! On 2014-04-22 3:07 PM, "Wei Liao" [email protected] wrote:

@TheCloudlessSky https://github.com/TheCloudlessSky, sorry for the late update. I did try to test out the newer version but we ended up using the older ver in our production. And I've been assigned to work on other stuffs since, so haven't been able to follow it up. As far as I'm concerned this issue is ok for me. Thank you!

— Reply to this email directly or view it on GitHubhttps://github.com/TheCloudlessSky/Harbour.RedisSessionStateStore/issues/9#issuecomment-41073717 .

TheCloudlessSky avatar Apr 22 '14 18:04 TheCloudlessSky