StackExchange.Redis
StackExchange.Redis copied to clipboard
Performance issue with 2.x
We had been using 1.2.6 for quite some time. 2 months ago, we decided to upgrade to 2.x. We did some testing, then ran into the performance issue. We have a service running in a cluster of 20 machines. For testing purpose, we upgraded and deployed 5 machines under 2.x. After running it for a while, we see that CPU of these 5 machines are higher than the other 15. Rough number is 70% vs 60%, which is pretty bad. Yesterday we did it again using the latest version 2.2.79, but the result is the same. We did some performance testing using performance profiler in VS but didn't find anything useful. After reading the release note, we noticed that one of the biggest changes for 2.x is that it uses Pipelines.Sockets.Unofficial. So this is what we suspect that caused the perofrmance issue. However, we cannot find any proof. What is so strange is that we replicate this issue in difference services and yet didn't find anyone reporting this issue over the internet. So we need some help here.
Some spec:
- IIS 10.x. Tried both on win server 2016 and 2019
- 500 QPS per machine
- 50+ redis calls per search
- All requests are StringGet
Some code:
// Constructor
public RedisCacheClient(CacheClientConfig config)
{
_config = config as RedisClientConfig;
_redisConnection = new RedisConnection(_config);
}
...
// Connection init. Only one database will be created for a redis instance.
private void InitConnection(RedisConnectionString redisConnectionString, out ConnectionMultiplexer connection, out IDatabase database, out IServer server)
{
connection = ConnectionMultiplexer.Connect(redisConnectionString.ConnectionString);
server = connection.GetServer(redisConnectionString.ConnectionString.Split(',')[0], DefaultPort);
connection.ErrorMessage += Connection_ErrorMessage;
connection.InternalError += Connection_InternalError;
database = connection.GetDatabase(redisConnectionString.OperateDb);
}
...
// Get string values
public async Task<List<TEntity>> GetListAsync<TEntity>(List<string> keys, CacheGetOptions<TEntity> options = null)
{
var redisValues = await _redisConnection.ReadDatabase.StringGetAsync(keys.Select(k => (RedisKey)FormatKey<TEntity>(k, _redisConnection.ReadKeyType)).ToArray()).ConfigureAwait(false);
return redisValues.Select(redisValue => TryAutoConvertToValue<TEntity>(redisValue, _config.RedisValueType).Item2).Where(v => v != null).ToList();
}
Hi, i am also facing performance issues ( not necessary CPU ) but more time take to get add object to REDIS and timeouts since i have started working with it ...
Some updates. After reading some documents, I suspect it has something to do with the threads. So I have been tweaking around the SocketManager settings. Turns out it does affect the CPU usage. The way we create the instance is, for every redis instance, we will create a connection. By looking at the source code, I noticed that it will create a dedicated threadpool with 10 workers by default for every connection. In our case, we have 10 redis instances, so it will create 100 threads by default. So I tried a couple of ways of creating our own SocketManger. Tried setting the default worker to 5. Tried using the system thread pool instead of the dedicated one. They both reduces the CPU work load. But no matter how I do, It is still higer than 1.2.6. I really don't know what to do next.
@kimglory To clarify here: it will not create 10 per connection - it creates 10 total in a shared pool (all multiplexers default to the 1 manager in an app instance). However, this can not be the ideal for either very large or very small workloads (or very small machines). A change we're considering for .NET 6 given the sync-over-async improved thread explosion handling is to default to the general thread pool overall.
You can try it out like this:
var config = ConfigurationOptions.Parse(redisConnectionString.ConnectionString);
config.SocketManager = SocketManager.ThreadPool;
var conn = ConnectionMultiplexer.Connect(config);
or inline:
var conn = ConnectionMultiplexer.Connect(redisConnectionString.ConnectionString, options => options.SocketManager = SocketManager.ThreadPool);
A change we're considering for .NET 6 given the sync-over-async improved thread explosion handling is to default to the general thread pool overall.
Could you please share a link to the improvement of sync-over-async thread explosion handling? Is it perhaps the following? Thank you so much.
@SpiritBob absolutely - that's the change, there are more details in Stephen's post here: https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-6/#threading
Facing the same performance problems: VERY high CPU usage
.
I used the latest version from the beginning. I made a webapi server, but the performance was not as good as I thought. I tried to improve the performance, but I failed. Looking at this article, it seems to be a similar situation to me. The cpu usage is abnormal. I don't know what modification I made, but I hope it will be revised.
Maybe a little digression - but I saw that in your example you wrote -
var redisValues = await _redisConnection.ReadDatabase.StringGetAsync(keys.Select(k => (RedisKey)FormatKey<TEntity>(k, _redisConnection.ReadKeyType)).ToArray()).ConfigureAwait(false);
Isn't there an issue with calling the MGET command when we're in the cluster mode or does the library know to separate the calls to multiple nodes in the cluster ?
I peeked at the code and from my understanding an exception would be thrown if the keys map to more than one "slot" - @mgravell your input / confirmation would be welcome :)
On Fri, Jan 28, 2022 at 8:14 AM Godwish @.***> wrote:
I used the latest version from the beginning. I made a webapi server, but the performance was not as good as I thought. I tried to improve the performance, but I failed. Looking at this article, it seems to be a similar situation to me. The cpu usage is abnormal. I don't know what modification I made, but I hope it will be revised.
— Reply to this email directly, view it on GitHub https://github.com/StackExchange/StackExchange.Redis/issues/1901#issuecomment-1023918455, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF745KF2UD5T72M2SGY5KZDUYIX25ANCNFSM5G3WWEPA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you are subscribed to this thread.Message ID: @.***>
I was originally using my own pooling system. The performance was not good.
I searched and changed it to the way others use it. Every time I use the command, I get Get Database. Among the pooled redis, GetCounts().totalOutstanding was in the form of commanding redis with less total outstanding.
There is no change in performance.
We're working hard to improve the performance, I can't get the result I want.
Facing the same performance problems: VERY high CPU usage
.
For context, a profiler is not showing you at this is high CPU usage. It's showing that from captured stack frames, these were present often. That's expected as it's the reader loop, but they are waiting, not eating CPU.
@godwish Can you provide any detail at all? Highly subjective terms like "not that good" doesn't tell us much of anything - absolute circumstances and numbers would help inform if what you're seeing is expected. There are a lot of variables in play with client, server, latency, bandwidth, CPU power at both ends, etc.
@NickCraver The api server I'm building now doesn't do heavy work. I am using redis and mariadb. I think the tps should exceed 10,000. However, there were about 1,500 cases. When I checked the performance with the tool, the redis call does not come out properly. Now we are approaching the other way. I am using dotnet5. Task.delay -> Task.yield There was an interesting change. (Not redis.) tps increased to 1,800 cases. (but the cpu usage remains the same) When I stress test, the CPU usage is always close to 100%. I'm thinking it might not be 100% redis problem. I think I'll have to look at the async await or the web client part one by one. I think light work is being handled heavily. The reason I said vaguely above is because there was no change in tps. Anyway, I'm trying to see the whole thing. This is a problem I am experiencing while making a new dotnet5, but I am using redis a lot, so I wrote it here.
I don't think my experience will help. But I left it. It was because even a small clue was needed to increase the performance.
The socketmanager supports 3 modes. 1. Dedicated, 2. Shared, 3. HighPriority. What we observe is for mode 1, it will create 10 threads per instance. Only for mode 2, it will share the threads in the thread pool. Anyway, we tried your code, but it didn't improve the performance. I start to think maybe it is because we are still using .net framework. Maybe something is fundamentally different in .net core. And that is the reason why we are this performance issue.
@kimglory Is there any chance you can try the latest pre-release on MyGet? I've made some decent performance jumps and CPU reductions and I'm hoping they help you out here.
@kimglory Is there any chance you can try the latest pre-release on MyGet? I've made some decent performance jumps and CPU reductions and I'm hoping they help you out here.
Just tried StackExchange.Redis.2.5.39-prerelease. Not much difference.
var config = ConfigurationOptions.Parse(connectionString);
var sm = new SocketManager(workerCount: 5, options: SocketManager.SocketManagerOptions.None);
config.SocketManager = sm;
var connection = ConnectionMultiplexer.Connect(config);
I need to set the worker to 5 to make it work properly. Otherwise, the CPU will go crazy.
Please see the image below. This is the CPU graph of one of our services. I picked 2 servers for comparison. The red one is the server I used for testing. The blue one is using the old redis version.
After configuring the socketmanager, the performance is actually not that bad. But compared to the 1.2.x version, you can still easly see the performance drop.
@kimglory I'm trying to understand the graph - I see a ~5% difference in CPU, but in both directions between the overlapping instances before/after, so I'm unsure what the comparison is overall - I see a similar differential between the 2 instances before the change, equal before the move, and a diff after (and again after the revert).
Overall, I'm unsure if this is in the area of jitter between the two or not - is there any chance we could see a longer duration there? Or am I misunderstanding the graph?
You need to look at the CPU difference between the 2 instances. Not the CPU trend of a single instance.
OK. some more background here. These two instances are running in the same service cluster, with same amount of traffic. But even in the same service cluster, and having the same hardware spec, instances can still behave slightly differently sometimes. The reason I picked these two instances is that the CPU of the red one is always lower than the blue one. so it is easier to monitor.
When I upgraded to 2.x for only the red one, the CPU of the red one started to surpass the blue one and stays stable. When I rolled back, the CPU dropped and became lower than the blue one again.
This is not just some random test. I did it on at least 3 different service clusters. And the results are the same. The 2.x always consumes slightly more CPU than 1.x.
@kimglory I see, in the graph I'm confused about the "always lower" due to the 11:05 to 12:00 section of that graph where they almost match, seems like there's some jitter between.
Overall: what are these doing? Are they processing the same number of requests/commands? Similar of different latencies? To be clear: we expect there to be some CPU difference when changing to 2.x, because 1.2.6 did somethings incorrectly and there's a bit of increase cost in doing them correctly consistently. The goal is to minimize that though, and any information you can provide there is very helpful (including traces or profiling).
At the moment (latest prerelease) when the only workload in a process is Redis we expect to see a ~5% difference in CPU (of the process, not e.g. 5% on any box - that's relative). I'm not sure how much of your cluster work is Redis vs. other processes here, so that information would help give a picture as well.
Thanks for all the info so far!
@kimglory I see, in the graph I'm confused about the "always lower" due to the 11:05 to 12:00 section of that graph where they almost match, seems like there's some jitter between.
Overall: what are these doing? Are they processing the same number of requests/commands? Similar of different latencies? To be clear: we expect there to be some CPU difference when changing to 2.x, because 1.2.6 did somethings incorrectly and there's a bit of increase cost in doing them correctly consistently. The goal is to minimize that though, and any information you can provide there is very helpful (including traces or profiling).
At the moment (latest prerelease) when the only workload in a process is Redis we expect to see a ~5% difference in CPU (of the process, not e.g. 5% on any box - that's relative). I'm not sure how much of your cluster work is Redis vs. other processes here, so that information would help give a picture as well.
Thanks for all the info so far!
This graph might not be the best example. But please trust me that we did lots of testing and got the similar result.
It is very hard to quantify how much work is redis. What I can tell you is this is not stress test. It is live service. These instances are processing live requests. They are in the same load balancer with same work load. And redis is heavily used here when processing a request. We used the performance profiler in VS before but couldn't find anything useful. But we did see Pipelines.Sockets.Unofficial poped up sometimes.
I don't know what you did incorrectly in 1.x and fixed in 2.x. But you also say that it is expected to see higher CPU in 2.x against 1.x. Any hope to fix/improve this inthe future?
@kimglory No current plans for performance increases past the 2.5.43 release - we'll of course optimize when the chance presents itself, but nothing I'm aware of is currently on the table. I'm curious what 2.5.43 looks like for you, but don't have more to offer than that at the moment - you may want to tweak some things for your scenario, though...it depends where you're exhausting first.
@kimglory If you have any update here (e.g. how 2.5.43+ behaves), I'd be very curious and appreciate any news! If that's not on the table, no worries and I'll close this out to tidy up for now :)
Closing this out to tidy up.