[CRASH] Starting KeyDB on ARM hardware causing serverAssert failure
I'm attempting to build a container image (has to be proprietary unfortunately) that is to be run on ARM hardware. Initially I was getting an error around invalid page size in jemalloc, but adding --with-lg-page=16 did get us past that problem.
Now on start I get server.cpp:6531 '!ret' is not true
Crash report
=== KEYDB BUG REPORT START: Cut & paste starting from here ===
1:1:C 31 Jul 2024 16:05:19.226 # === ASSERTION FAILED ===
1:1:C 31 Jul 2024 16:05:19.226 # ==> server.cpp:6531 '!ret' is not true
------ STACK TRACE ------
Backtrace:
keydb-server(linuxMadvFreeForkBugCheck()+0x368) [0x45b828]
keydb-server(main+0x31c) [0x4432cc]
/lib64/libc.so.6(+0x27300) [0xffffaa607300]
/lib64/libc.so.6(__libc_start_main+0x98) [0xffffaa6073d8]
keydb-server(_start+0x30) [0x447670]
------ INFO OUTPUT ------
Keydb starting as active-replica and multi-master
1:1:C 31 Jul 2024 16:08:03.327 * Notice: "active-replica yes" implies "replica-read-only no"
1:1:C 31 Jul 2024 16:08:03.327 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:1:C 31 Jul 2024 16:08:03.327 # oO0OoO0OoO0Oo KeyDB is starting oO0OoO0OoO0Oo
1:1:C 31 Jul 2024 16:08:03.327 # KeyDB version=6.3.4, bits=64, commit=7e7e5e57, modified=1, pid=1, just started
1:1:C 31 Jul 2024 16:08:03.327 # Configuration loaded
Additional information
- Not sure if this matters, but this is being deployed on rockylinux 8 based container image
- A perm link for the code in server.cpp aroudn that line number
yes ,on arm ,i also have this problem
In digging further it seems like this may be related to linux kernel specific to arm having a bug related to pgtable, and that keydb/redis code apparently attempts to check whether that bug exists in the linux kernel.
arm64: pgtable: Ensure dirty bit is preserved across pte_wrprotect()
Seems like running a linux kernel of a newer version (that had the original issue fixed) would likely start right up and work.
I guess my question here is whether I'd likely run into that issue, if I was not doing any writes to storage (essential memory only caching).
Hi, I want to know if this issue has been fixed after a year. I recently tried to build a benchmark on a Raspberry Pi 5. DragonFly and Redis both worked, but I was stuck on the KeyDB configuration due to this problem.
It's probably that the version of the Linux kernel you use actually has a data corruption bug. That's the true root of the problem. The other part is, that the check in code isn't functioning right... But you probably need to get a 5.x something kernel at minimum. Not sure when that bug was fixed. It's also only with arm in my experience
I use the latest Raspberry Pi 5 debian version, so linux kernel version must be 6.x