valkey icon indicating copy to clipboard operation
valkey copied to clipboard

Add DENYOOM flag to SCRIPT LOAD and make it fail on OOM

Open enjoy-binbin opened this issue 1 year ago • 9 comments

Currently we can load a lot of lua to bypass maxmemory, adds a DENYOOM flag to make it fail on OOM.

enjoy-binbin avatar Aug 02 '24 09:08 enjoy-binbin

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 70.24%. Comparing base (b728e41) to head (f6049db). Report is 582 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #866      +/-   ##
============================================
- Coverage     70.47%   70.24%   -0.24%     
============================================
  Files           112      112              
  Lines         61467    61467              
============================================
- Hits          43320    43177     -143     
- Misses        18147    18290     +143     
Files with missing lines Coverage Δ
src/commands.def 100.00% <ø> (ø)

... and 13 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Aug 02 '24 10:08 codecov[bot]

This makes sense to me @soloestoy Do you know of any use case where this would cause an issue. Multi-exec with script load should fail. I guess it's a breaking change so we need a major change probably.

madolson avatar Aug 07 '24 22:08 madolson

I suppose I realized one case this might fail. If someone is trying to load a script that cleans up data, it might fail unnecessarily. There are also previous cases where multi-execs would succeed but should now fail.

madolson avatar Aug 07 '24 22:08 madolson

I think this would be hard for users to understand. Currently, when the maxmemory is exceeded, data eviction is triggered, so DENYOOM is mainly aimed at data write operations. However, the scripts loaded by script load are not data and are not evicted, which makes it difficult to explain.

This also reminds me of a feature I worked on before https://github.com/redis/redis/pull/5454, where in order to prevent clients from caching too many commands in a MULTI context, causing uncontrolled memory growth, we marked transactions as dirty when we exceeded maxmemory. However, this was eventually reverted https://github.com/redis/redis/pull/12961 because it was hard for users to understand.

Maybe we can take a different approach by adding maxmemory-scripts to limit the size of cached scripts. It would only be allowed to use a certain percentage of maxmemory, and once exceeded, the scripts would be evicted (even those loaded by script load).

soloestoy avatar Aug 08 '24 06:08 soloestoy

Maybe we can take a different approach by adding maxmemory-scripts to limit the size of cached scripts. It would only be allowed to use a certain percentage of maxmemory, and once exceeded, the scripts would be evicted (even those loaded by script load).

I guess I want to ask how much of a real problem this is? AFAIK we at AWS have never seen this manifest at a problem, but maybe others have?

madolson avatar Aug 14 '24 20:08 madolson

I have seen many instances where number_of_cached_scripts reaches the million level, and used_memory_scripts_eval consumes tens of gigabytes of memory, lol. When a feature is unrestricted, it is prone to abuse.

soloestoy avatar Aug 15 '24 02:08 soloestoy

I have seen many instances where number_of_cached_scripts reaches the million level, and used_memory_scripts_eval consumes tens of gigabytes of memory, lol. When a feature is unrestricted, it is prone to abuse.

yeah, i have seem this too (both from eval scripts, haven't seen script load yet), that why now we have a eval script eviction.

btw, we now have these following issue:

  1. lua does not use the jemalloc, and we do not count lua memory into used_memory
  2. abuse of script loading (so maxmemory-scripts seems like a good idea, but i do think we need a DENYOOM flag when maxmemory-scripts is 0(unlimit))
  3. to ensure script atomicity, we can bypass maxmemory (data memory or lua heap memory):
EVAL "for i=1,10000000 do redis.call('SET', 'key-kkkk:' .. i, 'value-tttt:' .. i) end" 0
EVAL "for i=1,10000000 do redis.pcall('SET', 'key-kkkk:' .. i, 'value-tttt:' .. i) end" 0
EVAL "local a={}; local i; for i=1, 30000000 do a[i]=i; end" 0

enjoy-binbin avatar Aug 15 '24 04:08 enjoy-binbin

I have seen many instances where number_of_cached_scripts reaches the million level, and used_memory_scripts_eval consumes tens of gigabytes of memory, lol. When a feature is unrestricted, it is prone to abuse.

We have also seen this, but this is almost always from EVAL abuse, not SCRIPT LOAD + EVALSHA abuse. It looks like binbin also mentioned the same thing.

abuse of script loading (so maxmemory-scripts seems like a good idea, but i do think we need a DENYOOM flag when maxmemory-scripts is 0(unlimit))

I think the concern in the past was that many clients, on new connection establishment will send SCRIPT LOAD with some scripts that may reduce maxmemory, but they error out because the SCRIPT LOAD fails so they can never reduce memory. I would be more inclined to add a config here so that managed valkey providers have an operational config they can change but the default doesn't change.

madolson avatar Nov 04 '24 21:11 madolson

I would be more inclined to add a config here so that managed valkey providers have an operational config they can change but the default doesn't change.

sound good to me, what kind of config are we talking about here? maxmemory-scripts to limit the scripts memory (including server part and the lua VM part)?

enjoy-binbin avatar Nov 05 '24 02:11 enjoy-binbin