luv
luv copied to clipboard
memory leak related to `uv.send_async`
I made an ad-hoc by-line process output reader and it seems that I found a memory leak.
I wrapped into a min working example below. This is supposed to uv.async_send
the lines of the output of a certain process. I setup a process that prints data endlessly. The output is stored in current_chunk
, but it's only temporary: each time data arrives, I look at the chunk and extract the lines, except for the last one (in case the last line did not arrive completely).
If you keep the uv.send_async
line below, memory consumption of the lua
process will grow endlessly. If you remove it, the memory consumption remains still. In both cases, each line is discarded.
local uv = require("luv") -- "luv" when stand-alone, "uv" in luvi apps-- WIP luv (related to luvit)
local stdout = uv.new_pipe()
local handle, pid =
uv.spawn(
"yes",
{stdio = {nil, stdout, nil},
args = {}
},
function(code,signal)
-- ignore
end)
local current_chunk = "" -- accumulated data until a \n
local no_more_output = false
-- process each line
cb = uv.new_async(function(line)
-- discard line
if no_more_output then
cb:close()
end
end)
uv.read_start(stdout, function(err, data)
assert(not err, err)
if data then
-- append received data into current_chunk
current_chunk = string.format("%s%s", current_chunk, data)
else
no_more_output= true
end
-- split chunk into lines to send to `callback`
lines={}
for line in string.gmatch(current_chunk, "[^\n]+") do
table.insert(lines, line)
end
if not no_more_output then
-- pop last chunk back into current chunk for next iteration
last_item_position = #lines
current_chunk = table.remove(lines, last_item_position)
end
-- send the lines to the callback
for _,l in pairs(lines) do
-- this line below creates a memory leak
uv.async_send(cb, l)
end
-- close uv stuff if everything's done
if no_more_output then
uv.async_send(cb, nil)
end
end)
uv.run()
Happens with these lua versions:
% lua -v
Lua 5.3.3 Copyright (C) 1994-2016 Lua.org, PUC-Rio
% luajit -v
LuaJIT 2.1.0-beta3 -- Copyright (C) 2005-2017 Mike Pall. http://luajit.org/
As far as I can tell, what's happening here is this, from the Libuv docs:
Warning: libuv will coalesce calls to uv_async_send(), that is, not every call to it will yield an execution of the callback. For example: if uv_async_send() is called 5 times in a row before the callback is called, the callback will only be called once. If uv_async_send() is called again after the callback was called, it will be called again.
Luv currently expects a callback for every call of async_send to be able to free the memory for the data sent. Since not all sends are actually getting a callback called, though, that memory is never being freed.
With your example code, a script that prints the numbers 1-100, and a couple prints added:
read_start cb "0\n"
lines to be sent { }
read_start cb "1\n2\n3\n4\n5\n6\n"
lines to be sent { "01", "2", "3", "4", "5" }
read_start cb "7\n8\n9\n10\n11\n12\n13\n"
lines to be sent { "67", "8", "9", "10", "11", "12" }
async cb { "12" }
read_start cb "14\n15\n16\n17\n18\n19\n20\n"
lines to be sent { "1314", "15", "16", "17", "18", "19" }
read_start cb "21\n22\n23\n24\n25\n"
lines to be sent { "2021", "22", "23", "24" }
async cb { "24" }
...
That's 2 async_cb's called for around 24 async_send calls.
I'm not totally sure what the solution here would be, though.
This is actually both a memory leak and a bug, as we are dropping data here that we shouldn't be. Minimal test case:
test("multiple sends coalesce", function(print, p, expect, uv)
local async
async = uv.new_async(expect(function(data)
p(data)
uv.close(async)
end))
uv.async_send(async, 'this string will be lost')
uv.async_send(async, 'this string will be sent')
end)
This will only print
"this string will be sent"
meaning the "this string will be lost"
string is both leaked and is never given to the send callback.
I think we need to treat async data as a FIFO queue that we push onto in luv_async_send
and then pop until it's empty in luv_async_cb
, calling into Lua with the popped data each time. That is, we need to coalesce the data in the same way that Libuv coalesces calls to uv_async_send.
Seems like a good fix, but I wonder if any users are somehow relying on the libuv behavior.
Libuv doesn't handle any passing of data, so the coalesced callbacks are reasonable there (the only intention of Libuv's async is just to wake up the loop from any thread).
I guess it's possible that some Luv user is calling async_send
with the same data multiple times and then expecting only 1 callback call with that data, but it seems fairly unlikely, and, with our current implementation, that will leak the data for every coalesced callback.
It's a little unfortunate that our binding does not match the Libuv code; it'd probably be better if we had split the actual data sending part into a luv
-specific function (async_send_data
or something).
Ah okay. A queue makes sense then.
I don't mind changing my code at all, although I believe anyone using uv.send_async
should be covered by either fixing this coalesced-call-memory-leak within luv code or by the leak being thoroughly documented (perhaps a warning?)