valkey icon indicating copy to clipboard operation
valkey copied to clipboard

Add maxmemory-reserved-scale parameter for evicting key earlier to avoid OOM

Open hwware opened this issue 1 year ago • 5 comments

Reference: https://github.com/valkey-io/valkey/issues/742 and https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/cache-best-practices-memory-management (Azure)

Generally, when clients set maxmemory-policy as allkeys-lru or other memory eviction policies, and maxmemory as well, If server runs as write-heavy workloads, the data stored in memory could reach the maxmemory limit very quickly, then OOM message will be reported.

If we have maxmemory-reserved-scale parameter and client enable the lazyfree-lazy feature, for example, we can set maxmemory-reserved-scale as 0.8, it means key eviction begin when used memory reaches 8GB if maxmemory is 10GB, thus, server could continue process clients data and OOM will not happen at this time.

Thus, we can see the benefit is we can delay OOM time, but we need pay for less memory available.

One example for this paramter: Assume

maxmemory 4GB maxmemory-reserved-scale 20

Then we could check the detail by info memory command:

maxmemory:4294967296 maxmemory_human:4.00G maxmemory_policy:allkeys-lru maxmemory_reserved_scale:20 maxmemory_available:3435973836 maxmemory_available_human:3.20G

We could also update and get the maxmemory-reserved-scale value during runtime as following:

config set maxmemory-reserved-scale value config get maxmemory-reserved-scale

hwware avatar Jul 26 '24 15:07 hwware

Codecov Report

Attention: Patch coverage is 83.01887% with 9 lines in your changes missing coverage. Please review.

Project coverage is 71.03%. Comparing base (aced268) to head (4be3b3c). Report is 9 commits behind head on unstable.

Files with missing lines Patch % Lines
src/config.c 69.23% 4 Missing :warning:
src/module.c 0.00% 3 Missing :warning:
src/evict.c 87.50% 2 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #831      +/-   ##
============================================
+ Coverage     70.87%   71.03%   +0.16%     
============================================
  Files           121      121              
  Lines         65203    65282      +79     
============================================
+ Hits          46212    46375     +163     
+ Misses        18991    18907      -84     
Files with missing lines Coverage Δ
src/server.c 87.72% <100.00%> (+0.09%) :arrow_up:
src/server.h 100.00% <ø> (ø)
src/evict.c 98.10% <87.50%> (-0.76%) :arrow_down:
src/module.c 9.59% <0.00%> (ø)
src/config.c 78.24% <69.23%> (-0.16%) :arrow_down:

... and 23 files with indirect coverage changes

codecov[bot] avatar Jul 26 '24 15:07 codecov[bot]

this seem like a interesting feature, did not review, just drop a comment that approve the concept.

enjoy-binbin avatar Sep 11 '24 15:09 enjoy-binbin

I also like the idea. I just haven't really spent enough time thinking about it. Memory management is a big area in Valkey we need to improve.

madolson avatar Sep 11 '24 15:09 madolson

+1 on introducing "back pressure" earlier. I feel that this could be used with `maxmemory_eviction_tenacity" to give an even smoother eviction experience.

PingXie avatar Sep 25 '24 04:09 PingXie

Thanks @hwware! LGTM overall. Can you add some tests too?

Sure, ready to work, Thanks

hwware avatar Oct 02 '24 13:10 hwware

@PingXie According to your last comment suggestion, I made some code changes as below:

  1. In getMaxmemoryState, I update the level calculation way to *level = (float)mem_used / (float)server.maxmemory; instead of pass the maxmemory as a function parameter

  2. performEvictions return EVICT_FAIL only when the memory usage goes beyond the real maxmemory (server.maxmemory)

  3. if used_memory is above soft max but below hard max, trigger key eviction and continue with the normal processing

Please take a look, Thanks

hwware avatar Oct 29 '24 01:10 hwware

@valkey-io/core-team can you please chime in on the new server config introduced in this PR?

# `maxmemory-soft-scale` defines a "soft" maxmemory threshold as a scale of the `maxmemory` limit.
# When memory usage exceeds this scale, Valkey begins proactive key eviction. However, exceeding this 
# threshold does not immediately reject new write commands; only the hard `maxmemory` limit will do so.
# This scale is specified as a percentage of `maxmemory`, with a valid range between 10 and 60. 
#
# maxmemory-soft-scale 0

PingXie avatar Nov 25 '24 06:11 PingXie

Interesting feature. I didn't look at it until now.

So the purpose is that instead of SET command performing eviction, which adds latency to SET, the eviction can be done earlier by cron and this makes the SET command faster?

The config name and semantic is a little bit confusing to me. Why only 10-60% allowed? And does 10% actually mean that the soft limit is 90% of maxmemory? This is confusing to me. It's better that the config says maxmemory-soft-percent 90 or something like that.

Another question: Why percent or maxmemory? If the server has 100MB write commands per minute, then we need to evict 100MB per minute, but it is very different in a server with maxmemory 1GB compared to a server with maxmemory 100GB.

I suggest even more explicit config just in bytes, so you can configure simply like this:

maxmemory 4GB
maxmemory-soft 3.5GB

zuiderkwast avatar Nov 25 '24 10:11 zuiderkwast

I suggest even more explicit config just in bytes

Yeah that makes sense.

the eviction can be done earlier by cron

earlier - yes; by cron - not the case right now. It is still performed inline but I like the idea of async eviction for "soft" maxmemory too.

PingXie avatar Nov 25 '24 20:11 PingXie

Interesting feature. I didn't look at it until now.

So the purpose is that instead of SET command performing eviction, which adds latency to SET, the eviction can be done earlier by cron and this makes the SET command faster?

As the top comment describe, the feature will enable the early key eviction process, it theory, it can not make SET command faster, even make it slower (due to the eviction while loop)

The config name and semantic is a little bit confusing to me. Why only 10-60% allowed? And does 10% actually mean that the soft limit is 90% of maxmemory? This is confusing to me. It's better that the config says maxmemory-soft-percent 90 or something like that.

Honestly said, I do not satisfy with this name either. The name origins from the https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/cache-best-practices-memory-management. Your comment gives me a little bit hint, I would rather name it as key-eviction-memory, and the config as 'key-eviction-memory bytes'. Thus, the only relation between key-eviction-memory and maxmemory is key-eviction-memory should be less or equal to the maxmemory. It makes sense?

Another question: Why percent or maxmemory? If the server has 100MB write commands per minute, then we need to evict 100MB per minute, but it is very different in a server with maxmemory 1GB compared to a server with maxmemory 100GB.

I suggest even more explicit config just in bytes, so you can configure simply like this:

maxmemory 4GB
maxmemory-soft 3.5GB

hwware avatar Nov 26 '24 20:11 hwware

@hwware

maxmemory 4GB
key-eviction-memory 3.5MB

Yes, it is nice. Maybe missing a word like max, limit or threshold? key-eviction-threshold 3.5GB?

zuiderkwast avatar Nov 27 '24 19:11 zuiderkwast

@hwware

maxmemory 4GB
key-eviction-memory 3.5MB

Yes, it is nice. Maybe missing a word like max, limit or threshold? key-eviction-threshold 3.5GB?

threshold is acceptable. key-eviction-maxmemory is not good, because I thought maxmemory is pure max limit memory, and limit a little bit weird.

hwware avatar Nov 27 '24 19:11 hwware

key-eviction-memory sounds more reasonable.

I have always hoped that we could accurately measure the memory used by various different modules and then configure different thresholds and strategies, such as maxmemory-data, so that eviction would only occur when the data exceeds this threshold. Here, data specifically refers to all key-values, excluding other system memory usage such as client buffer or slowlog, etc. Although this PR doesn't completely meet my needs, I feel like the general direction is aligned.

soloestoy avatar Nov 28 '24 09:11 soloestoy

I have always hoped that we could accurately measure the memory used by various different modules and then configure different thresholds and strategies, such as maxmemory-data, so that eviction would only occur when the data exceeds this threshold. Here, data specifically refers to all key-values, excluding other system memory usage such as client buffer or slowlog, etc. Although this PR doesn't completely meet my needs, I feel like the general direction is aligned.

@soloestoy There is already an INFO field used_memory_dataset. Do you think we should use this number to do eviction with new config like maxmemory-data? Documentation of the INFO field:

used_memory_dataset: The size in bytes of the dataset (used_memory_overhead subtracted from used_memory)

zuiderkwast avatar Dec 17 '24 15:12 zuiderkwast

@soloestoy There is already an INFO field used_memory_dataset. Do you think we should use this number to do eviction with new config like maxmemory-data? Documentation of the INFO field:

This seems to be going in a slightly different direction. My mental model so far has been a "tighter" maxmemory, which is still a super set of used_memory_dataset. With @soloestoy's proposal of per "component" thresholds, more specifically "data only" threshold in this case, I think the implementation would be quite different. I can see a "tigher" maxmemory being a natural extension of maxmemory while used_memory_dataset providing a new capability. I don't have a strong opinion between the two but I feel that we only need one. We should discuss some more and settle down on one.

PingXie avatar Jan 08 '25 06:01 PingXie

@soloestoy There is already an INFO field used_memory_dataset. Do you think we should use this number to do eviction with new config like maxmemory-data?

Yes, I tend to use used_memory_dataset for eviction, as it better aligns with the semantics of maxmemory-data or key-eviction-memory and the actual action: delete key.

However, the current used_memory_dataset is not accurate. It is used_memory - overhead_memory, a reverse calculation rather than a direct statistical analysis of the data, we can further optimize this issue in other PRs.

soloestoy avatar Jan 16 '25 02:01 soloestoy

@valkey-io/core-team According to yesterday meeting discussion, please confirm the following items:

  1. We still keep current way to calculate the used_memory_dataset, and we will improve it in the future
  2. Madelyn said in AWS, the percentage of the maxmemoy is used to do the early eviction, she prefers to align with it. Thus I need to revert to the previous codes. Do you (tsc member) agree with it?
  3. Is there a recommended config name for this feature?

Thanks

hwware avatar Jan 28 '25 16:01 hwware

I agree with \1. For \2, I'm okay with either it being a percent or a fixed amount, but besides that I think having it be something subtracted from maxmemory makes more sense. That way if you want to change maxmemory, you don't also have to change this variable (usually).

I would like something like "reserved-memory-buffer" for naming.

madolson avatar Jan 31 '25 14:01 madolson

How about maxmemory-headroom (and maxmemory-headroom-percentage if we decide to add a percentage-based config later)? This PR essentially enforces maxmemory earlier in the data access path, so keeping maxmemory in the config name makes it more intuitive for users to understand its purpose.

PingXie avatar Feb 01 '25 23:02 PingXie

I've heard headroom less often than more descriptive terms like "spare-capacity" or "reserved-memory". I don't feel that strongly across them though.

I like including maxmemory, so I guess my vote would be "maxmemory-reserved-memory".

madolson avatar Feb 02 '25 16:02 madolson

I like including maxmemory, so I guess my vote would be "maxmemory-reserved-memory".

I am good with maxmemory-reserved.

PingXie avatar Feb 02 '25 22:02 PingXie

I have carefully reviewed the Azure documentation again, and IIUC, this parameter is defined in Azure as the memory reserved under maxmemory to ensure that valkey always has at least maxmemory-reserved amount of memory available. Based on this definition, I have some concerns about the potential confusion when using this parameter:

  1. Firstly, from a semantic standpoint, it is not intuitive to associate "reserved" with "eviction". The actual trigger for memory eviction is the limit set by maxmemory - maxmemory-reserved. Although the calculation is straightforward, it requires a reverse thinking process.

  2. Secondly, can we truly guarantee that this amount of memory is reserved? It seems not. For example, if maxmemory is 1GB and maxmemory-reserved is set to 100MB, the memory can still be used beyond 900MB if there are no eligible keys for eviction (even if memory usage exceeds maxmemory - maxmemory-reserved, there will be no OOM error to prevent further writes). From the user's perspective, this could appear to be somewhat out of control.

the percentage of the maxmemoy is used to do the early eviction

If our goal is to implement early eviction, I think we should use a more appropriate name for this configuration.

soloestoy avatar Feb 06 '25 03:02 soloestoy

@soloestoy good finding that the Azure service already has maxmemory-reserved. It's described like this:

The maxmemory-reserved setting configures the amount of memory, in MB per instance in a cluster, that is reserved for non-cache operations, such as replication during failover.

Further, here it's documented as

The maxmemory-reserved setting configures the amount of memory in MB per instance in a cluster that is reserved for noncache operations, such as replication during failover. Setting this value allows you to have a more consistent Redis server experience when your load varies. This value should be set higher for workloads that write large amounts of data. When memory is reserved for such operations, it's unavailable for storage of cached data. The minimum and maximum values on the slider are 10% and 60%, shown in megabytes. You must set the value in that range.

I assume it is a hard limit for data size, but IIUC we want a soft limit to start early eviction but not reject writes. Other than that, it seems very similar to our idea, I think. Maybe it's OK?

The "reserved" memory in this PR is also for keys when we have bursts of writes and we can't evict fast enough...? It seems OK to me.

zuiderkwast avatar Feb 06 '25 15:02 zuiderkwast

@soloestoy good finding that the Azure service already has maxmemory-reserved. It's described like this:

The maxmemory-reserved setting configures the amount of memory, in MB per instance in a cluster, that is reserved for non-cache operations, such as replication during failover.

Further, here it's documented as

The maxmemory-reserved setting configures the amount of memory in MB per instance in a cluster that is reserved for noncache operations, such as replication during failover. Setting this value allows you to have a more consistent Redis server experience when your load varies. This value should be set higher for workloads that write large amounts of data. When memory is reserved for such operations, it's unavailable for storage of cached data. The minimum and maximum values on the slider are 10% and 60%, shown in megabytes. You must set the value in that range.

I assume it is a hard limit for data size, but IIUC we want a soft limit to start early eviction but not reject writes. Other than that, it seems very similar to our idea, I think. Maybe it's OK?

The "reserved" memory in this PR is also for keys when we have bursts of writes and we can't evict fast enough...? It seems OK to me.

I implemented this in my first commit https://github.com/valkey-io/valkey/pull/831/commits/3b9fb9fd14fd14c807c8b1fea022c87a812e840e, it is a percetage of the maxmemory, well match with the Azure doc.

hwware avatar Feb 06 '25 18:02 hwware

The INFO field mem_not_counted_for_evict should include the reserved memory, right? In our docs, we say that eviction happens when

used_memory - mem_not_counted_for_evict > maxmemory

so we should make sure this formula is still correct. Docs here: https://valkey.io/topics/lru-cache/

No, i do not think so. The mem_not_counted_for_evict (aka called by freeMemoryGetNotCountedMemory()) is the size of replicas output buffers and AOF buffer from the count of used memory.

However, the reserved memory is the size memory stepped away from the maxmemory. It should not include the AOF and replica output buffer part.

hwware avatar Feb 06 '25 19:02 hwware

The INFO field mem_not_counted_for_evict should include the reserved memory, right? In our docs, we say that eviction happens when

used_memory - mem_not_counted_for_evict > maxmemory

so we should make sure this formula is still correct. Docs here: https://valkey.io/topics/lru-cache/

No, i do not think so. The mem_not_counted_for_evict (aka called by freeMemoryGetNotCountedMemory()) is the size of replicas output buffers and AOF buffer from the count of used memory.

However, the reserved memory is the size memory stepped away from the maxmemory. It should not include the AOF and replica output buffer part.

So.... instead we should update the docs to explain the formula like this?

used_memory - mem_not_counted_for_evict > maxmemory - maxmemory_reserved

zuiderkwast avatar Feb 06 '25 19:02 zuiderkwast

since what we actually do is early key/data eviction, how about change the name to maxmemory-key-eviction or maxmemory-dataset, that can also align with maxmemory-clients.

soloestoy avatar Feb 07 '25 06:02 soloestoy

@soloestoy maxmemory-key-eviction or maxmemory-dataset, you also suggested maxmemory-eviction-threshold. These are good names, but these names all mean the total memory to use, not a margin below maxmemory?

zuiderkwast avatar Feb 07 '25 08:02 zuiderkwast

Azure uses maxmemory-reserved name is aligned with our goal, you can check the link https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/cache-best-practices-memory-management.

And I post the screenshot here image

So let's discuss the naming issue in weekly meeting, Thanks

hwware avatar Feb 07 '25 18:02 hwware

Decision: We're going to wait until after RC1 to decide what to do about this. It sounds like @hwware had some internal use case where this was important related to storing data on disk. Wen, can you follow up with the specific use case you had in mind for this?

madolson avatar Feb 10 '25 15:02 madolson