cffi-lua icon indicating copy to clipboard operation
cffi-lua copied to clipboard

bug: Access Violation exception during garbage collection of variadic function types

Open gynt opened this issue 4 months ago • 3 comments

I noticed an Access Violation causing my app to crash. I added some debug lines to cdata_meta::gc() so it reports which object it is trying to collect before it crashes. Apparently, it is trying to collect a callback: cdata_meta::gc() ctype<void (*)()> ~~Interestingly, in the same gc event, it tried to gc the same type: cdata_meta::gc() ctype<void (*)()>, but perhaps that is a red herring.~~

TLDR of this thread: After hours of digging the problem appears quite straightforward. Garbage collection of variadic functions is not supported. This code can reproduce it reliably but it may take a while for it to occur since GC is unreliable, and perhaps the underlying cause has to do with alignment.

cffi = require("cffi")
ct = cffi.typeof("void (__cdecl *)(int a, ...)")
ct = nil
collectgarbage()

The stacktrace looks as follows (note that after precallC is basically just the lua GC routine)

cffi.dll!ffi::fdata_free_aux(lua_State * __formal=0x11fc4c9c, ffi::fdata & fd) Line 75 C++ Symbols loaded. cffi.dll!ffi::destroy_cdata(lua_State * L=0x11fc4c9c, ffi::cdata & cd={...}) Line 115 C++ Symbols loaded. cffi.dll!cdata_meta::gc(lua_State * L=0x11fc4c9c) Line 100 C++ Symbols loaded. lua.dll!precallC(lua_State * L=0x11fc4c9c, StackValue * func=0x37703bb8, int nresults=0, int(*)(lua_State ) f=0x69f87550) Line 529 C Symbols loaded. lua.dll!luaD_precall(lua_State * L=0x11fc4c9c, StackValue * func=0x37703bb8, int nresults=0) Line 595 C Symbols loaded. lua.dll!ccall(lua_State * L=0x11fc4c9c, StackValue * func=0x37703bb8, int nResults=0, unsigned int inc=65537) Line 635 C Symbols loaded. lua.dll!luaD_callnoyield(lua_State * L=0x11fc4c9c, StackValue * func=0x37703bb8, int nResults=0) Line 655 C Symbols loaded. lua.dll!dothecall(lua_State * L=0x11fc4c9c, void * ud=0x00000000) Line 901 C Symbols loaded. lua.dll!luaD_rawrunprotected(lua_State * L=0x11fc4c9c, void()(lua_State *, void ) f=0x6afbe6c0, void * ud=0x00000000) Line 144 C Symbols loaded. lua.dll!luaD_pcall(lua_State * L=0x11fc4c9c, void()(lua_State *, void ) func=0x6afbe6c0, void * u=0x00000000, int old_top=720, int ef=0) Line 953 C Symbols loaded. lua.dll!GCTM(lua_State * L=0x11fc4c9c) Line 921 C Symbols loaded. lua.dll!runafewfinalizers(lua_State * L=0x11fc4c9c, int n=10) Line 940 C Symbols loaded. lua.dll!singlestep(lua_State * L=0x11fc4c9c) Line 1631 C Symbols loaded. lua.dll!incstep(lua_State * L=0x11fc4c9c, global_State * g=0x11fc4d10) Line 1672 C Symbols loaded. lua.dll!luaC_step(lua_State * L=0x11fc4c9c) Line 1696 C Symbols loaded. lua.dll!luaV_execute(lua_State * L=0x11fc4c9c, CallInfo * ci=0x1b651fd8) Line 1587 C Symbols loaded. lua.dll!ccall(lua_State * L=0x11fc4c9c, StackValue * func=0x377039e8, int nResults=-1, unsigned int inc=65537) Line 637 C Symbols loaded. lua.dll!luaD_callnoyield(lua_State * L=0x11fc4c9c, StackValue * func=0x377039e8, int nResults=-1) Line 655 C Symbols loaded. lua.dll!f_call(lua_State * L=0x11fc4c9c, void * ud=0x0019d4c4) Line 1038 C Symbols loaded. lua.dll!luaD_rawrunprotected(lua_State * L=0x11fc4c9c, void()(lua_State *, void ) f=0x6afa0ef0, void * ud=0x0019d4c4) Line 144 C Symbols loaded. lua.dll!luaD_pcall(lua_State * L=0x11fc4c9c, void()(lua_State *, void ) func=0x6afa0ef0, void * u=0x0019d4c4, int old_top=256, int ef=0) Line 953 C Symbols loaded. lua.dll!lua_pcallk(lua_State * L=0x11fc4c9c, int nargs=4, int nresults=-1, int errfunc=0, int ctx=0, int()(lua_State , int, int) k=0x6afac190) Line 1064 C Symbols loaded. lua.dll!luaB_pcall(lua_State * L=0x11fc4c9c) Line 477 C Symbols loaded. lua.dll!precallC(lua_State * L=0x11fc4c9c, StackValue * func=0x377039c8, int nresults=3, int()(lua_State ) f=0x6afabca0) Line 529 C Symbols loaded. lua.dll!luaD_precall(lua_State * L=0x11fc4c9c, StackValue * func=0x377039c8, int nresults=3) Line 595 C Symbols loaded. lua.dll!luaV_execute(lua_State * L=0x11fc4c9c, CallInfo * ci=0x12454fd8) Line 1684 C Symbols loaded. lua.dll!ccall(lua_State * L=0x11fc4c9c, StackValue * func=0x377038f8, int nResults=1, unsigned int inc=65537) Line 637 C Symbols loaded. lua.dll!luaD_callnoyield(lua_State * L=0x11fc4c9c, StackValue * func=0x377038f8, int nResults=1) Line 655 C Symbols loaded. lua.dll!f_call(lua_State * L=0x11fc4c9c, void * ud=0x0019fb58) Line 1038 C Symbols loaded. lua.dll!luaD_rawrunprotected(lua_State * L=0x11fc4c9c, void()(lua_State *, void ) f=0x6afa0ef0, void * ud=0x0019fb58) Line 144 C Symbols loaded. lua.dll!luaD_pcall(lua_State * L=0x11fc4c9c, void()(lua_State *, void ) func=0x6afa0ef0, void * u=0x0019fb58, int old_top=16, int ef=0) Line 953 C Symbols loaded. lua.dll!lua_pcallk(lua_State * L=0x11fc4c9c, int nargs=1, int nresults=1, int errfunc=0, int ctx=0, int()(lua_State *, int, int) k=0x00000000) Line 1064 C Symbols loaded.

gynt avatar Sep 11 '25 13:09 gynt

Can someone explain to me why this+1 is required here? It walks into inaccessible memory on my machine. https://github.com/q66/cffi-lua/blob/9f2acc9a2a0c8e59dda35c0e11333d1b66296667/src/ffi.hh#L146

gynt avatar Sep 11 '25 14:09 gynt

the memory allocated is the cdata structure plus memory for the value aligned to correct alignment

so it resides after the cdata's struct fields (this + 1) at the first correctly aligned address (ptr_align)

i guess the code that calls as_ptr here does not account for that a reference (not pointer) type may be passed

q66 avatar Sep 11 '25 15:09 q66

How would I fix this?

I think I found some reproducible code. This code crashes upon garbage collection 50% of the time after the first definition of f (once I had an infinite loop garbage collection). In the other 50%, it returns the error variadic callbacks are not supported. Just repeat this code (like in the example below) and you will get a guaranteed crash soon. I think it is annoying that trying to create a variadic callback crashes the library. But the real issue is caused by the variadic function type being garbage collected.

cffi = require("cffi")
cb = function(a, ...) print(a, ...) end
f = cffi.cast([[
  void (__cdecl *)
  (
    int param_1,
    ...
  )
]], cb)

cffi = require("cffi")
cb = function(a, ...) print(a, ...) end
f = cffi.cast([[
  void (__cdecl *)
  (
    int param_1,
    ...
  )
]], cb)

collectgarbage()

Passing a number to cast() is allowed (technically you don't know if the number is pointig to a lua callback right), but this crashes upon garbage collection.

cffi = require("cffi")
f = cffi.cast([[
  void (__cdecl *)
  (
    int param_1,
    ...
  )
]], 0x401000)
collectgarbage()

I think this contradicts the docs somewhat. The docs only state declaration is supported, but then casting should be supported too, right? https://github.com/q66/cffi-lua/blob/9f2acc9a2a0c8e59dda35c0e11333d1b66296667/docs/syntax.md?plain=1#L29


Doing more testing, this also crashes:

cffi = require("cffi")
func = function(a, ...) print(a, ...) end
pFunc = cffi.cast("void (__cdecl *)(int)", func)
pFunc(100) -- prints 100
pFuncAddr = cffi.tonumber(cffi.cast("unsigned long", pFunc))
f = cffi.cast([[
  void (__cdecl *)
  (
    int param_1,
    ...
  )
]], pFuncAddr)
collectgarbage()

Or to run some statistics do:

cffi = require("cffi")
cb = function(a, ...) print(a, ...) end
for i = 1,100 do
  print(i)
  f = pcall(function()
    -- if cffi.typeof() is used to store the type, then no crash occurs.
    return cffi.cast([[
      void (__cdecl *)
      (
        int param_1,
        ...
      )
    ]], cb)
  end)
  collectgarbage()
end

gynt avatar Sep 26 '25 12:09 gynt