KeyDB icon indicating copy to clipboard operation
KeyDB copied to clipboard

[CRASH] Starting KeyDB on ARM hardware causing serverAssert failure

Open ronnyek opened this issue 1 year ago • 5 comments

I'm attempting to build a container image (has to be proprietary unfortunately) that is to be run on ARM hardware. Initially I was getting an error around invalid page size in jemalloc, but adding --with-lg-page=16 did get us past that problem.

Now on start I get server.cpp:6531 '!ret' is not true

Crash report

=== KEYDB BUG REPORT START: Cut & paste starting from here ===
1:1:C 31 Jul 2024 16:05:19.226 # === ASSERTION FAILED ===
1:1:C 31 Jul 2024 16:05:19.226 # ==> server.cpp:6531 '!ret' is not true

------ STACK TRACE ------

Backtrace:
keydb-server(linuxMadvFreeForkBugCheck()+0x368) [0x45b828]
keydb-server(main+0x31c) [0x4432cc]
/lib64/libc.so.6(+0x27300) [0xffffaa607300]
/lib64/libc.so.6(__libc_start_main+0x98) [0xffffaa6073d8]
keydb-server(_start+0x30) [0x447670]

------ INFO OUTPUT ------
Keydb starting as active-replica and multi-master
1:1:C 31 Jul 2024 16:08:03.327 * Notice: "active-replica yes" implies "replica-read-only no"
1:1:C 31 Jul 2024 16:08:03.327 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:1:C 31 Jul 2024 16:08:03.327 # oO0OoO0OoO0Oo KeyDB is starting oO0OoO0OoO0Oo
1:1:C 31 Jul 2024 16:08:03.327 # KeyDB version=6.3.4, bits=64, commit=7e7e5e57, modified=1, pid=1, just started
1:1:C 31 Jul 2024 16:08:03.327 # Configuration loaded

Additional information

  1. Not sure if this matters, but this is being deployed on rockylinux 8 based container image
  2. A perm link for the code in server.cpp aroudn that line number

ronnyek avatar Jul 31 '24 16:07 ronnyek

yes ,on arm ,i also have this problem

jcy1001 avatar Aug 01 '24 06:08 jcy1001

In digging further it seems like this may be related to linux kernel specific to arm having a bug related to pgtable, and that keydb/redis code apparently attempts to check whether that bug exists in the linux kernel. arm64: pgtable: Ensure dirty bit is preserved across pte_wrprotect()

Seems like running a linux kernel of a newer version (that had the original issue fixed) would likely start right up and work.

I guess my question here is whether I'd likely run into that issue, if I was not doing any writes to storage (essential memory only caching).

ronnyek avatar Aug 01 '24 14:08 ronnyek

Hi, I want to know if this issue has been fixed after a year. I recently tried to build a benchmark on a Raspberry Pi 5. DragonFly and Redis both worked, but I was stuck on the KeyDB configuration due to this problem.

mocusez avatar Jul 27 '25 22:07 mocusez

It's probably that the version of the Linux kernel you use actually has a data corruption bug. That's the true root of the problem. The other part is, that the check in code isn't functioning right... But you probably need to get a 5.x something kernel at minimum. Not sure when that bug was fixed. It's also only with arm in my experience

ronnyek avatar Jul 28 '25 02:07 ronnyek

I use the latest Raspberry Pi 5 debian version, so linux kernel version must be 6.x

mocusez avatar Jul 28 '25 02:07 mocusez