dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

track and expose lua memory usage

Open romange opened this issue 10 months ago • 3 comments

We integrate our own allocator into lua bindings (see mimalloc_glue) but we do not track its allocations.

In some extreme cases it can be significant. Consider 20K-40K connections over k-threaded Dragonfly with interpreter_per_thread=300 running bullmq read requests.

End result: expose used_memory_lua (same name as in valkey) via /metrics and via "info memory".

romange avatar Apr 24 '24 19:04 romange

Another thing: interpreter_per_thread may block a client. This, in turn creates a hidden bottleneck. it is possible to identify the event of being blocked in Borrow() function and expose this event it in "INFO STATS" / metrics. This way, we will be able to identify this easily.

romange avatar Apr 24 '24 19:04 romange

also count total number of interpreters

romange avatar May 02 '24 07:05 romange

make sure we can flush lua memory and we do not have any leaks from lua

romange avatar May 02 '24 08:05 romange

make sure we can flush lua memory and we do not have any leaks from lua

Re/ flush Lua memory:

  1. I played a bit with calling lua_gc(LUA_GCCOLLECT) directly. While it's a simpler implementation on our side (no need for a mutex, return_untracked_, etc), it is not able to free as much memory as closing the instance and re-initializing it
  2. A single (new, idle) Lua instance takes ~26kb of memory
  3. I could not get the Lua instances (running all sorts of simple scripts) to consume more than 70kb (per instance). GC kicks in and reduces consumption. It usually fluctuates between 30kb-60kb. This of course depends on the script at hand, I'm running some simple scripts and BullMQ load tests.

Re/ leaks: I was not able to detect any leaks after playing a bit with BullMQ and manually written scripts. After running many scenarios, eventually SCRIPT FLUSH will clear all instances and memory, going back to 26kb (per instance).

chakaz avatar May 06 '24 05:05 chakaz

so maybe it's not lua. it could be that we are still missing a rather large contributor to backing heap usage.

romange avatar May 06 '24 07:05 romange

or we have a memory leak

romange avatar May 06 '24 07:05 romange

I'll try to reproduce a case in which there's a gap between RSS and other means of accounting memory. If I succeed, I can investigate further.

chakaz avatar May 06 '24 07:05 chakaz