lazy.nvim icon indicating copy to clipboard operation
lazy.nvim copied to clipboard

bug: LuaJIT FFI: wrong time_t in stats.lua causes segfault on 32-bit musl (ARMv7)

Open amaxcz opened this issue 3 months ago • 2 comments

Did you check docs and existing issues?

  • [x] I have read all the lazy.nvim docs
  • [x] I have updated the plugin to the latest version before submitting this issue
  • [x] I have searched the existing issues of lazy.nvim
  • [x] I have searched the existing issues of plugins related to this issue

Neovim version (nvim -v)

v0.11.3

Operating system/version

Gentoo Linux

Describe the bug

This was a hard-to-diagnose crash: plain Neovim runs fine, but AstroNvim crashes. The failure happens deep inside LuaJIT, and to pinpoint it I had to rebuild everything with debug symbols and run under Valgrind. That exposed a struct size/layout mismatch. From there the trail led to the time-related FFI declaration, a small early init hook, and the corresponding fix.

Summary

On 32-bit Linux with musl (e.g., ARMv7 hard-float), lazy.nvim segfaults during startup when stats.lua calls clock_gettime via LuaJIT FFI. Root cause: in stats.lua FFI declares:

typedef long time_t; struct timespec { time_t tv_sec; long tv_nsec; };

On musl 32-bit, time_t is 64-bit, while long is 32-bit. This wrong layout makes clock_gettime write 8 bytes into a 4-byte field (tv_sec), corrupting memory and later crashing the GC (lj_alloc_free).

Environment

Distro: Gentoo, musl 1.2.5 (32-bit) Arch: ARMv7 hard-float (-march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=hard) Neovim: 0.11.3 LuaJIT: 2.1.1731601260 lazy.nvim: current master (Sept 2025) libc: musl (note: on musl 32-bit sizeof(time_t) == 8)

nvim --version (short):

NVIM v0.11.3
Build type: Release
LuaJIT 2.1.1731601260
Compilation: /usr/lib/distcc/bin/armv7a-unknown-linux-musleabihf-gcc -Os -pipe -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=hard   -Wall -Wextra -pedantic -Wno-unused-parameter -Wstrict-prototypes -std=gnu99 -Wshadow -Wconversion -Wvla -Wdouble-promotion -Wmissing-noreturn -Wmissing-format-attribute -Wmissing-prototypes -fsigned-char -fstack-protector-strong -Wno-conversion -fno-common -Wno-unused-result -Wimplicit-fallthrough -fdiagnostics-color=always  -DUNIT_TESTING -D_GNU_SOURCE -DINCLUDE_GENERATED_DECLARATIONS -DUTF8PROC_STATIC -I/usr/include/luajit-2.1 -I/usr/include -I/var/tmp/portage/app-editors/neovim-0.11.3/work/neovim-0.11.3_build/src/nvim/auto -I/var/tmp/portage/app-editors/neovim-0.11.3/work/neovim-0.11.3_build/include -I/var/tmp/portage/app-editors/neovim-0.11.3/work/neovim-0.11.3_build/cmake.config -I/var/tmp/portage/app-editors/neovim-0.11.3/work/neovim-0.11.3/src 

  system vimrc file: "/etc/vim/sysinit.vim"
  fall-back for $VIM: "/usr/share/nvim"

Why this only hits some platforms:

glibc 32-bit often still has 32-bit time_t unless compiled with _TIME_BITS=64; the mistaken FFI layout happens to match and hides the bug. musl 32-bit always uses 64-bit time_t, so the mismatch is guaranteed and shows up as a crash. LuaJIT FFI does not parse system headers/macros, so compiler flags like -D_TIME_BITS=64 don’t help here.

Proposed fix:

function M.cputime()
  local ffi = require("ffi")
  local is_linux_32 = (jit.os == "Linux") and ffi.abi("32bit")

  -- Only set up FFI on non-32-bit Linux. On 32-bit Linux, use the uv fallback.
  if M.C == nil and not is_linux_32 then
    pcall(function()
      ffi.cdef[[
        typedef int clockid_t;
        /* Do not typedef time_t. Use explicit layout that matches 64-bit ABIs. */
        struct timespec { long tv_sec; long tv_nsec; };
        typedef struct timespec nanotime;
        int clock_gettime(clockid_t clk_id, struct timespec *tp);
      ]]
      M.C = ffi.C
    end)
  end

  local function real()
    local pnano = assert(ffi.new("nanotime[?]", 1))
    local CLOCK_PROCESS_CPUTIME_ID = jit.os == "OSX" and 12 or 2
    ffi.C.clock_gettime(CLOCK_PROCESS_CPUTIME_ID, pnano)
    return tonumber(pnano[0].tv_sec) * 1e3 + tonumber(pnano[0].tv_nsec) / 1e6
  end

  local function fallback()
    return (vim.uv.hrtime() - require("lazy")._start) / 1e6
  end

  if M.C ~= nil then
    local ok, ret = pcall(real)
    if ok then
      M.cputime = real
      M._stats.real_cputime = true
      return ret
    end
  end

  M.cputime = fallback
  return fallback()
end

O-ooo-och!

Why this is “the right thing”: On 32-bit Linux (musl/glibc with TIME_BITS=64) the time_t layout is tricky. We avoid FFI entirely there. We never redefine time_t globally in FFI (which can break other modules). On 64-bit (where long is 64-bit), the explicit struct timespec { long, long } matches the ABI and is safe.

Steps To Reproduce

Repro (minimal):

  1. Clean config; load lazy.nvim normally (e.g., AstroNvim or require("lazy").setup{}).

  2. Run: nvim --headless +qa

  3. Segfault.

GDB backtrace (short)

Top frames (symbols from distro build):

#0 lj_alloc_free (lj_alloc.c:1400)
#3 lj_cdata_free (lj_cdata.c:83)
#4 gc_sweep (lj_gc.c:423)
#7 lua_pushstring (lj_api.c:669)
#8 lj_cf_package_require (lib_package.c:463)   ; name="lazy.view.commands"
#13 lua_pcall (lj_api.c:1151)
#14 nlua_pcall (src/nvim/lua/executor.c:180)
#15 nlua_exec_file (src/nvim/lua/executor.c:1862) ; loading init.lua

Root cause

In stats.lua: typedef long time_t; // WRONG for musl 32-bit struct timespec { time_t tv_sec; long tv_nsec; }; int clock_gettime(clockid_t, struct timespec*);

On musl 32-bit, time_t is 64-bit (Y2038-safe ABI). FFI does not include system headers nor honor _TIME_BITS=64; it trusts the manual typedef. As a result, struct timespec has 4-byte tv_sec in FFI, but kernel/libc writes 8 bytes → memory overwrite → later crash in GC.

Workarounds users can apply today

  1. One-liner fix (in place):
sed -i -E 's/^[[:space:]]*typedef[[:space:]]+long[[:space:]]+time_t;/typedef long long time_t;/' \
  ~/.local/share/nvim/lazy/lazy.nvim/lua/lazy/stats.lua

  1. Early system init hook (no patching plugin files):

/etc/xdg/nvim/sysinit.vim

" --- Build toolchain & flags (why this is right) ---
" Many Neovim plugins build outside Portage via ad-hoc Makefiles/CMake/Ninja and often hardcode -O2
" or override CFLAGS inside their Makefiles. Setting CC/CXX/CFLAGS/CXXFLAGS/LDFLAGS here **and**
" pushing them via MAKEFLAGS forces your ABI (armv7-a, vfpv3-d16, hard-float) and prevents random
" plugin builds from sneaking in wrong arch/opt flags. Using -j1 also avoids racy parallel builds
" on older ARM and keeps logs readable.

let $CC='gcc'
let $CXX='g++'
let $CFLAGS='-Os -pipe -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=hard'
let $CXXFLAGS=$CFLAGS
let $LDFLAGS='-Wl,-O1 -Wl,--as-needed'
let $MAKEFLAGS='-j1 CFLAGS=$(CFLAGS) CXXFLAGS=$(CXXFLAGS) LDFLAGS=$(LDFLAGS)'
let $CMAKE_BUILD_PARALLEL_LEVEL='1'



" --- Lazy.nvim time_t workaround (musl 32-bit) ---
" LuaJIT FFI in lazy’s stats path assumed `time_t == long` (32-bit), but on musl 32-bit `time_t` is
" 64-bit. That mismatch corrupts memory and can crash in LuaJIT GC. This early hook defines the
" correct `struct timespec` layout **before** lazy loads. It’s a minimal safety patch; the proper
" upstream fix is to avoid typedef’ing time_t in FFI or skip the FFI path on 32-bit Linux.

" Early musl time_t fix for LuaJIT FFI
lua << EOF
-- Keep JIT ON; just fix the FFI layout on 32-bit musl
local ok, ffi = pcall(require, "ffi"); if not ok then return end
if not ffi.abi("32bit") then return end

-- define once
local has_timespec = pcall(ffi.typeof, "struct timespec")
if not has_timespec then
  ffi.cdef[[
    typedef long long time_t;         /* musl 32-bit: time_t is 64-bit */
    typedef int clockid_t;
    struct timespec { time_t tv_sec; long tv_nsec; };
    typedef struct timespec nanotime;
    int clock_gettime(clockid_t clk_id, struct timespec *tp);
  ]]
end
EOF



Expected Behavior

^_^

Repro

vim.env.LAZY_STDPATH = ".repro"
load(vim.fn.system("curl -s https://raw.githubusercontent.com/folke/lazy.nvim/main/bootstrap.lua"))()

require("lazy.minit").repro({
  spec = {
    -- add any other plugins here
  },
})

amaxcz avatar Sep 09 '25 10:09 amaxcz

I just pushed a fix that still uses ffi, but now always forces a int64_t for time_t, which should work in all situations.

Would be great if you could test the fix.

And thank you for the elaborate issue report!

folke avatar Oct 09 '25 08:10 folke

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Nov 09 '25 02:11 github-actions[bot]

This issue was closed because it has been stalled for 7 days with no activity.

github-actions[bot] avatar Nov 16 '25 02:11 github-actions[bot]