coro-channel: resuming before having a chance to yield
coro_http.request("GET", "https://google.com", {}, "")
would error with;
cannot resume running coroutine
stack traceback:
/mnt/data/bilal/deps/coro-channel.lua:16: in function 'assertResume'
/mnt/data/bilal/deps/coro-channel.lua:126: in function 'callback'
/mnt/data/bilal/deps/secure-socket/biowrap.lua:43: in function 'write'
/mnt/data/bilal/deps/coro-channel.lua:148: in function 'write'
/mnt/data/bilal/deps/coro-http.lua:149: in function </mnt/data/bilal/deps/coro-http.lua:106>
[C]: in function 'xpcall'
[string "bundle:/deps/repl.lua"]:97: in function 'evaluateLine'
[string "bundle:/deps/repl.lua"]:189: in function <[string "bundle:/deps/repl.lua"]:187>
stack traceback:
[C]: in function 'error'
/mnt/data/bilal/deps/coro-channel.lua:16: in function 'assertResume'
/mnt/data/bilal/deps/coro-channel.lua:126: in function 'callback'
/mnt/data/bilal/deps/secure-socket/biowrap.lua:43: in function 'write'
/mnt/data/bilal/deps/coro-channel.lua:148: in function 'write'
/mnt/data/bilal/deps/coro-http.lua:149: in function </mnt/data/bilal/deps/coro-http.lua:106>
While the request is very wrong (providing a body on GET) it shouldn't have produced such error. I am afraid it is affecting wider use-cases of the library.
I will be looking deeper into this.
coro-channel.lua#L126:
local function wait()
local thread = coroutine.running()
return function (err)
print("reaching assertResume")
assertResume(thread, err)
end
end
coro-channel#L149:
print("reaching socket:write")
local success, err = socket:write(chunk, wait())
if not success then
closer.errored = err
closer.check()
print("reaching return")
return nil, err
end
print("reaching coroutine.yield")
err = coroutine.yield()
return not err, err
Outputs:
reaching socket:write
reaching coroutine.yield
reaching assertResume
reaching socket:write
reaching assertResume
cannot resume running coroutine
stack traceback: etc...
So it seems like it is trying to resume before reaching yield (a trend at this point in many of the coro-* libs), although I am not sure why this happen when body is provided in the above scenario, probably something similar to what coro-split used to do, the fix can as well be the same one we did for coro-split implementing a lock on yield, and only resuming if it was locked.
Should this locking mechanism become part of assertResume/utils perhaps? it is not that often we see this error around but it is also not that rare.
$ luvit
Welcome to the Luvit repl!
> require'coro-http'.request("POST", "https://google.com", {}, "")
cannot resume running coroutine
stack traceback:
As I assumed, it was affecting a wider range. The cause is basically the body being too small, looks like it finish writing before yielding as explained before.
Will fix this in the same way we did it for coro-split.. tomorrow hopefully.
Edit: Well this was unexpected... After applying the following patch to coro-channel:
local function makeWrite(socket, closer)
local hasYielded, hasReturned = false, false
local function wait()
local thread = coroutine.running()
return function (err)
if hasYielded then
print("states: ", hasYielded, hasReturned)
assertResume(thread, err)
print("Resumed thread")
else
print("Skipped resuming, not yielded")
hasReturned = err
end
end
end
local function write(chunk)
hasYielded, hasReturned = false, false
if closer.written then
return nil, "already shutdown"
end
-- p("->", chunk)
if chunk == nil then
closer.written = true
closer.check()
local success, err = socket:shutdown(wait())
if not success then
return nil, err
end
err = coroutine.yield()
return not err, err
end
local success, err = socket:write(chunk, wait())
if not success then
closer.errored = err
closer.check()
print("Write not successful")
return nil, err
end
print("State before yielding:", hasYielded == true)
if hasReturned then
print("Returning without yielding")
return not hasReturned, hasReturned
else
hasYielded = true
print("got yielded")
err = coroutine.yield()
print("Returning after yielded")
return not err, err
end
end
return write
end
I tried again with the require'coro-http'.request("POST", "https://google.com", {}, ""), it doesn't error so that's good.. except it doesn't error. Just after printing Resumed thread Luvit panics and quit with no errors.
I don't think this is caused by this patch, because I tried with latest coro-channel again, this time just changed assertResume to a regular coroutine.resume, and the same panic occurred. I suspect some of the recent changes broke this.
Edit 3: Nothing too serious apparently, I was just yielding when I should not be, making it yield forever, luv detects that and interpret it as "No job to be done, therefor exit" and this happen. The strace just reflected the cleaning job Luvit do before quitting. The patch should now be ready, it make a heavy use of states (3 states per wrapWrite) but I don't want to refactor the whole thing just for this, it should be sufficient.