Storing unsafe references to `L`
In a number of places luv stores a long-lived reference to L such as src/luv.c:712.
This is not safe, however - L may refer to the current thread (coroutine) that the code is being called in, and threads are subject to garbage collection. After the thread (coroutine) is collected, the stored L value points to freed memory.
This materialized as a segfault in a complex application. It's hard to reduce to a simple test case that reliably segfaults, but running the below sample under valgrind will show the invalid memory accesses that are happening:
local uv
local f = coroutine.wrap(function ()
uv = require "luv"
uv.new_timer();
end);
f()
f = nil;
for i = 1, 10 do
collectgarbage("collect")
end
uv = nil
for i = 1, 10 do
collectgarbage("collect")
end
Valgrind output:
$ valgrind lua5.2 luv-crash.lua
==31681== Memcheck, a memory error detector
==31681== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==31681== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==31681== Command: lua5.2 luv-crash.lua
==31681==
==31681== Invalid read of size 8
==31681== at 0x10FAE3: lua_settop (lapi.c:165)
==31681== by 0x64718E4: luv_close_cb (handle.c:81)
==31681== by 0x6471D88: luv_gc_cb (handle.c:108)
==31681== by 0x6695D7F: uv_run (in /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0)
==31681== by 0x6479361: loop_gc (luv.c:690)
==31681== by 0x11300C: luaD_precall (ldo.c:319)
==31681== by 0x113371: luaD_call (ldo.c:401)
==31681== by 0x11292E: luaD_rawrunprotected (ldo.c:131)
==31681== by 0x1135DE: luaD_pcall (ldo.c:603)
==31681== by 0x11487E: GCTM (lgc.c:824)
==31681== by 0x115E1D: callallpendingfinalizers (lgc.c:978)
==31681== by 0x115E1D: luaC_freeallobjects (lgc.c:988)
==31681== by 0x11A04D: close_state (lstate.c:226)
(...truncated...)
Some possible solutions:
- Don't (if possible) hold onto L, because the lifetime isn't guaranteed
- Resolve it to the main thread (5.2+ stores it in the registry at index
LUA_RIDX_MAINTHREAD) - Update it when possible (e.g. if __gc is called with
Lthat differs to what is in the struct, update the struct) - ~Store a reference to the current thread to prevent it being GC'd (over the top if we don't really need anything in that thread)~ (not a solution, see following comment)
- Throw error if L is not the main thread (would be annoying as a user)
This is a second test case related to this issue:
local thread = coroutine.create(function ()
local uv = require "luv"
coroutine.yield();
end);
coroutine.resume(thread);
local uv = require "luv";
local t = uv.new_timer();
t:start(200, 0, function ()
t:start(200); -- throws an error
end);
uv.run("default");
The initial thread is not garbage collected, but is suspended. Because the timer callback is using the L of a suspended thread to call into the Lua API, some things are in an inconsistent state. The error gets raised inside the suspended thread, and Lua accesses uninitialized memory while trying to build the stack trace (again, doesn't always segfault but valgrind shows the invalid access).
This demonstrates that option 4 in my previous comment would not be a valid solution to the problem.
Thank you for the detailed report!
I hear rumours of a release. Is a fix for this something we could get into it? I can try to take a look this week if it helps.
I think from my list option 3 is the most robust solution.
Yeah, I'd be interested in seeing what a solution might look like.
(side note: I can reproduce the Valgrind error with the first test case, but not with the second test case [tried with Lua 5.1, Lua 5.4, and LuaJIT])
Hi - I think the proper solution is to add the lua context to userdata value field, using lua_setuservalue(). Then the Lua garbage collector won't collect the lua_State if the context is alive.