hiredis-cluster icon indicating copy to clipboard operation
hiredis-cluster copied to clipboard

Crash while performing redisClusterAsyncFree()

Open rahulKrishnaM opened this issue 1 year ago • 0 comments

Hi, came across this crash during one of the runs.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007e36c41fe3a2 in selectNode (nodes=<optimized out>)
[Current thread is 1 (Thread 0x7e36b50b3a40 (LWP 21))]
(gdb) bt
#0  0x00007e36c41fe3a2 in selectNode (nodes=<optimized out>)
#1  updateSlotMapAsync (acc=acc@entry=0x58a8403266c0)
#2  0x00007e36c41fe598 in clusterSlotsReplyCallback (ac=<optimized out>, r=0x0, privdata=0x58a8403266c0)
#3  0x00007e36c420ce79 in __redisRunCallback (reply=0x0, cb=0x7fffa203cfc0, ac=0x58a84f4e8780)
#4  __redisAsyncFree (ac=0x58a84f4e8780)
#5  0x00007e36c420d645 in redisAsyncFree (ac=<optimized out>)
#6  0x00007e36c41f9e90 in freeRedisClusterNode (node=0x58a87df71bc0)
#7  0x00007e36c41f7c56 in _dictClear (ht=0x58a85c542180)
#8  dictRelease (ht=0x58a85c542180)
#9  0x00007e36c41fbfc6 in redisClusterFree (cc=0x58a84218b300)
#10 0x00007e36c41ff555 in redisClusterAsyncFree (acc=0x58a8403266c0)
### layers below called from application while shutting down.

Looks like the execution was at this line number when the crash happened

https://github.com/Nordix/hiredis-cluster/blob/a2f1b825dda8f1438bd0df93880185d1e879da99/hircluster.c#L3878

I am suspecting the reason for SEGV could be the below code area:

https://github.com/Nordix/hiredis-cluster/blob/a2f1b825dda8f1438bd0df93880185d1e879da99/dict.c#L179

where, even though we are freeing up the memory pointed by redisClusterNode*, we are not unlinking it from the table. Hence it gets looked up later on(during selectNode()), which would be a dangling pointer.

        if ((he = ht->table[i]) == NULL)
            continue;
        while (he) {
            nextHe = he->next;
            dictFreeEntryKey(ht, he);
            dictFreeEntryVal(ht, he);      // memory released, but entry not set to NULL, so ht->table holds a dangling pointer ?
            hi_free(he);
            ht->used--;
            he = nextHe;
        }

And also, other thing that looks odd is the library trying to reattempt discovery when the redisClusterAsyncContext object is getting deleted. Maybe we should prevent that path altogether, in case of redisClusterAsyncFree() where it's about to tear everything down anyways.

cc: @bjosv

rahulKrishnaM avatar May 21 '24 18:05 rahulKrishnaM