[CRASH] RM_OpenKey not thread-safe
Testing on async_flash branch, but from my analysis, this bug has been around for a long time.
To add some context, I previously reported these dictRehash crashes: https://github.com/Snapchat/KeyDB/issues/876 https://github.com/Snapchat/KeyDB/issues/792
Having more time to review the code, adding extra logging and using gdb, I have finally tracked down the root cause.
The method RM_OpenKey() in module.cpp (called by a module in its own thread) has this call flow:
RM_OpenKey() -> lookupKeyReadWithFlags() -> lookupKeyConst() -> db->find() -> ensure() -> dictAdd() -> dictAddRaw() -> _dictRehashStep() -> dictRehash()
This means there can be multiple threads (one above and the one from keydb) calling the dictRehash() concurrently. Further testing shows that the hashtable in dict.cpp can become corrupted, leading to crashes.
I am experimenting with some new ideas for solving this on my local, and will report here if any of them is successful. Just wanted to report this first, just in case others have also encountered the same crashes.