hiredis icon indicating copy to clipboard operation
hiredis copied to clipboard

crash in s_free(sh) in sdsMakeRoomFor function during redis pipeline call in hiredis-0.14.0

Open sg893052 opened this issue 3 years ago • 6 comments

Hi, Hitting a crash in s_free(sh) in sdsMakeRoomFor function during redis pipeline call in hiredis-0.14.0/sds.c Could someone please help? Thanks!

Backtrace: #0 0x00007fb465978fff in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb46597a42a in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007fb4659b6c00 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x00007fb4659bcfc6 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #4 0x00007fb4659bd80e in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #5 0x00007fb466836462 in sdsMakeRoomFor (s=0x55f230687f05 "", addlen=addlen@entry=234) at sds.c:230 #6 0x00007fb466836dac in sdscatlen (s=, t=0x7ffda19f00a0, len=234) at sds.c:379 #7 0x00007fb46683acd4 in redisReaderFeed (r=0x55f23068c600, buf=buf@entry=0x7ffda19f00a0 "*18\r\n$5\r\nindex\r\n$2\r\n37\r\n$5\r\nlanes\r\n$2\r\n44\r\n$11\r\ndescription\r\n$0\r\n\r \n$23\r\noverride_unreliable_los\r\n$3\r\noff\r\n$12\r\nvalid_speeds\r\n$10\r\n10000,1000\r\n$3\r\nmtu\r\n$4\r\n9100\r\n$5\r
nalias\r\n$7\r\nEth1/37\r\n$12\r\nadmin_stat"..., len=) at read.c:537 #8 0x00007fb466835734 in redisBufferRead (c=0x55f23068c340) at hiredis.c:800 #9 0x00007fb466835b90 in redisGetReply (c=0x55f23068c340, reply=reply@entry=0x7ffda19f41d0) at hiredis.c:874 #10 0x00007fb4665b4d17 in swss::RedisReply::RedisReply (this=0x7ffda19f41d0, db=0x55f23068b250, command=...) at redisreply.cpp:34 #11 0x00007fb4665b4f12 in swss::RedisReply::RedisReply (this=0x7ffda19f41d0, db=, command=..., expectedType=2) at redisreply.cpp:47 #12 0x00007fb4665c4b20 in swss::RedisPipeline::push (expectedType=2, command=..., this=0x55f23068a9d0) at ../common/redispipeline.h:83 #13 swss::Table::get (this=this@entry=0x55f22fe91cc0 <sync+1312>, key="Ethernet36", values=std::vector of length 0, capacity 0) at table.cpp:81

sg893052 avatar Feb 02 '21 04:02 sg893052

Hi @sg893052 it's hard to say without code I can use to replicate the crash.

That said, v0.14.0 is very old and there have been several possible crashes fixed since then. Are you able to test with a newer version of hiredis?

michael-grunder avatar Feb 02 '21 06:02 michael-grunder

Hi Michael, Sorry for the delayed response. The current situation demands me to address this issue in the hiredis version(0.14.0) itself due to time limit.

Upon further investigation with the core file, there seem to be a corruption in sds header flag value s[-1] which is "(rediscontext->reader->buf)-1". s[-1] value was set as 5 which is a not a valid value, resulting in crash in sdsMakeRoomFor() while freeing an invalid pointer. In fact, s[-1] should have ideally been one of below sds header types. #define SDS_TYPE_5 0 #define SDS_TYPE_8 1 #define SDS_TYPE_16 2 #define SDS_TYPE_32 3 #define SDS_TYPE_64 4 #define SDS_TYPE_MASK 7

Do you see a safer way to handle this in hiredis/sds code ?

Debug:

(gdb) frame 5 #5 0x00007fb466836462 in sdsMakeRoomFor (s=0x55f230687f05 "", addlen=addlen@entry=234) at sds.c:230 230 sds.c: No such file or directory. (gdb) p s[-1] $31 = 5 '\005' (gdb)

(gdb) p oldtype $8 = 5 '\005'

sdsHdrSize(char type) returns 0 as oldtype was invalid value 5

s_free(sh) crashes

Also, I was able to simulate the crash with the hiredis_test program available in hiredis package by corrupting s[-1] with 5 in gdb and proceeding further. *** Error in `/home/admin/hiredis-test': free(): invalid pointer: 0x0000000000610663 ***

code:

sds sdsMakeRoomFor(sds s, size_t addlen) { void *sh, *newsh; size_t avail = sdsavail(s); size_t len, newlen; char type, oldtype = s[-1] & SDS_TYPE_MASK; ===> oldtype value returned as 5 int hdrlen;

/* Return ASAP if there is enough space left. */
if (avail >= addlen) return s;

len = sdslen(s);
sh = (char*)s-sdsHdrSize(oldtype);   ======> sdsHdrSize(char type) returns 0 as oldtype was invalid value 5
newlen = (len+addlen);
if (newlen < SDS_MAX_PREALLOC)
    newlen *= 2;
else
    newlen += SDS_MAX_PREALLOC;

type = sdsReqType(newlen);

/* Don't use type 5: the user is appending to the string and type 5 is
 * not able to remember empty space, so sdsMakeRoomFor() must be called
 * at every appending operation. */
if (type == SDS_TYPE_5) type = SDS_TYPE_8;

hdrlen = sdsHdrSize(type);
if (oldtype==type) {
    newsh = s_realloc(sh, hdrlen+newlen+1);
    if (newsh == NULL) return NULL;
    s = (char*)newsh+hdrlen;
} else {
    /* Since the header size changes, need to move the string forward,
     * and can't use realloc */
    newsh = s_malloc(hdrlen+newlen+1);
    if (newsh == NULL) return NULL;
    memcpy((char*)newsh+hdrlen, s, len+1);
    s_free(sh);    ========================================> place of crash  -> free of invalid pointer
    s = (char*)newsh+hdrlen;
    s[-1] = type;
    sdssetlen(s, len);
}
sdssetalloc(s, newlen);
return s;

}

sg893052 avatar Feb 08 '21 13:02 sg893052

Hi @sg893052 I'm not sure I totally follow. Obviously corrupting the sdshdr could cause a crash, but the question I would have is how is the header becoming corrupted?

Have you attempted to run your program through valgrind or clang's sanitizers to catch the corruption?

michael-grunder avatar Feb 08 '21 23:02 michael-grunder

Hi Michael, The crash is not reproducible. I was able to find the memory corruption(sdshdr) from the core file debugging. Attempting the valgrind run for the application in parallel. I was hoping if we could up with a protection fix in sds side to check it's validity and safely return to the application complaining of the corruption instead of crashing.

sg893052 avatar Feb 09 '21 08:02 sg893052

The issue with checking for corruption inside of SDS is that it doesn't actually solve your problem and could even make it worse. What you really want to do is catch the corruption when it happens.

I mentioned valgrind and ASAN but you might also give rr a try. With rr you would only need to catch the bug once and then replay it as many times as you want.

I've used it to track down really difficult to reproduce concurrency bugs by automating the runs and doing them in parallel. If your application is multi-threaded you should also look into running rr with the chaos mode enabled which will mess with the OS scheduler at a low level.

Good luck!

michael-grunder avatar Feb 09 '21 18:02 michael-grunder

Sure, Thanks for the inputs!

sg893052 avatar Feb 10 '21 11:02 sg893052

Closing old issue

michael-grunder avatar Sep 01 '22 03:09 michael-grunder