valkey icon indicating copy to clipboard operation
valkey copied to clipboard

[BUG] Use after free with lua-debug

Open ranshid opened this issue 7 months ago • 1 comments

=================================================================
==9998==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000009f90 at pc 0x00000072b7a3 bp 0x7fff21143be0 sp 0x7fff21143bd0
READ of size 4 at 0x606000009f90 thread T0
#0 0x72b7a2 in connBlock src/socket.c:459
#1 0x74f7f2 in ldbStartSession lua/debug_lua.c:178
#2 0x628078 in evalGenericCommandWithDebugging src/eval.c:678
#3 0x62650d in evalCommand src/eval.c:531
#4 0x626528 in evalRoCommand src/eval.c:535
#5 0x48f1a9 in call src/server.c:3733
#6 0x4931e5 in processCommand src/server.c:4359
#7 0x4de07d in processCommandAndResetClient src/networking.c:3142
#8 0x4de7f9 in processInputBuffer src/networking.c:3270
#9 0x4df247 in readQueryFromClient src/networking.c:3375
#10 0x729303 in callHandler src/connhelpers.h:79
#11 0x72aaf7 in connSocketEventHandler src/socket.c:301
#12 0x462d13 in aeProcessEvents src/ae.c:486
#13 0x46349d in aeMain src/ae.c:543
#14 0x4a487b in main src/server.c:7212
#15 0x7f4e65b2c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
#16 0x452029 in _start (src/valkey-server+0x452029)

0x606000009f90 is located 16 bytes inside of 64-byte region [0x606000009f80,0x606000009fc0)
freed by thread T0 here:
#0 0x7f4e668fa1e5 in __interceptor_free (/lib64/libasan.so.4+0xd81e5)
#1 0x4ad826 in zfree_internal src/zmalloc.c:400
#2 0x4ad85f in valkey_free src/zmalloc.c:415
#3 0x72994e in connSocketClose src/socket.c:155
#4 0x729185 in connClose src/connection.h:271
#5 0x729376 in callHandler src/connhelpers.h:82
#6 0x72aaf7 in connSocketEventHandler src/socket.c:301
#7 0x462d13 in aeProcessEvents src/ae.c:486
#8 0x46349d in aeMain src/ae.c:543
#9 0x4a487b in main src/server.c:7212
#10 0x7f4e65b2c139 in __libc_start_main (/lib64/libc.so.6+0x21139)

previously allocated by thread T0 here:
#0 0x7f4e668fa753 in __interceptor_calloc (/lib64/libasan.so.4+0xd8753)
#1 0x4accf6 in ztrycalloc_usable_internal src/zmalloc.c:214
#2 0x4acfc1 in valkey_calloc src/zmalloc.c:257
#3 0x729396 in connCreateSocket src/socket.c:79
#4 0x72945b in connCreateAcceptedSocket src/socket.c:99
#5 0x72ae88 in connSocketAcceptHandler src/socket.c:332
#6 0x462d13 in aeProcessEvents src/ae.c:486
#7 0x46349d in aeMain src/ae.c:543
#8 0x4a487b in main src/server.c:7212
#9 0x7f4e65b2c139 in __libc_start_main (/lib64/libc.so.6+0x21139)

SUMMARY: AddressSanitizer: heap-use-after-free src/socket.c:459 in connBlock
Shadow bytes around the buggy address:
0x0c0c7fff93a0: fd fd fd fd fa fa fa fa 00 00 00 00 00 00 00 00
0x0c0c7fff93b0: fa fa fa fa fd fd fd fd fd fd fd fa fa fa fa fa
0x0c0c7fff93c0: 00 00 00 00 00 00 00 00 fa fa fa fa fd fd fd fd
0x0c0c7fff93d0: fd fd fd fd fa fa fa fa 00 00 00 00 00 00 00 00
0x0c0c7fff93e0: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
=>0x0c0c7fff93f0: fd fd[fd]fd fd fd fd fd fa fa fa fa 00 00 00 00
0x0c0c7fff9400: 00 00 00 00 fa fa fa fa 00 00 00 00 00 00 00 00
0x0c0c7fff9410: fa fa fa fa 00 00 00 00 00 00 00 00 fa fa fa fa
0x0c0c7fff9420: 00 00 00 00 00 00 00 00 fa fa fa fa 00 00 00 00
0x0c0c7fff9430: 00 00 00 00 fa fa fa fa 00 00 00 00 00 00 00 00
0x0c0c7fff9440: fa fa fa fa 00 00 00 00 00 00 00 00 fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07 
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==9998==ABORTING

Credits to @uriyage for locating this with his fuzzer testing However we did not yet dive deep or found a reproduction

ranshid avatar May 21 '25 06:05 ranshid

I have a PR that is changing part of this code (https://github.com/valkey-io/valkey/pull/1701). I need to check if it has the same problem or not.

rjd15372 avatar May 21 '25 11:05 rjd15372

Reproduction Steps:

Client 1:
127.0.0.1:6379> SCRIPT DEBUG yes  /* ldb->conn points to Client 1's connection */
OK

Client 2:
127.0.0.1:6379> SCRIPT DEBUG yes  /* ldb->conn now points to Client 2's connection */
OK

Client 2:
127.0.0.1:6379> QUIT  /* ldb->conn still points to Client 2's freed connection */

Client 1:
127.0.0.1:6379> EVAL "local result = {} result[1] = redis.call(\"SET\",\"str\",\"val\") return result" 0

/* At this point, ldb->conn still points to Client 2's connection, but the Lua debug session 
   is activated for Client 1, creating a potential use-after-free condition */

Looks like the issue is that the system doesn't properly support simultaneous debug sessions. When multiple clients enable debug mode, they overwrite the same global debug connection pointer

uriyage avatar May 22 '25 08:05 uriyage

I think this state would have been better managed in the client structure. however this would mean much code changes IMO. we should consider prevent multiple debug sessions, but that would be a breaking change. @rjd15372 what do you think?

ranshid avatar May 26 '25 06:05 ranshid

I agree with @ranshid, I don't think it's worth to support multiple debug session for a feature that is only used in the development/testing phase and possibly not used that much.

rjd15372 avatar May 26 '25 09:05 rjd15372

Lets discuss this in the weekly meeting as this is a breaking change proposal.

ranshid avatar May 26 '25 09:05 ranshid

Sure, we don't need to support multiple debug sessions.

madolson avatar May 26 '25 15:05 madolson