lit icon indicating copy to clipboard operation
lit copied to clipboard

coro-channel: resuming before having a chance to yield

Open Bilal2453 opened this issue 4 years ago • 2 comments

coro_http.request("GET", "https://google.com", {}, "") would error with;

cannot resume running coroutine
stack traceback:
	/mnt/data/bilal/deps/coro-channel.lua:16: in function 'assertResume'
	/mnt/data/bilal/deps/coro-channel.lua:126: in function 'callback'
	/mnt/data/bilal/deps/secure-socket/biowrap.lua:43: in function 'write'
	/mnt/data/bilal/deps/coro-channel.lua:148: in function 'write'
	/mnt/data/bilal/deps/coro-http.lua:149: in function </mnt/data/bilal/deps/coro-http.lua:106>
	[C]: in function 'xpcall'
	[string "bundle:/deps/repl.lua"]:97: in function 'evaluateLine'
	[string "bundle:/deps/repl.lua"]:189: in function <[string "bundle:/deps/repl.lua"]:187>
stack traceback:
	[C]: in function 'error'
	/mnt/data/bilal/deps/coro-channel.lua:16: in function 'assertResume'
	/mnt/data/bilal/deps/coro-channel.lua:126: in function 'callback'
	/mnt/data/bilal/deps/secure-socket/biowrap.lua:43: in function 'write'
	/mnt/data/bilal/deps/coro-channel.lua:148: in function 'write'
	/mnt/data/bilal/deps/coro-http.lua:149: in function </mnt/data/bilal/deps/coro-http.lua:106>

While the request is very wrong (providing a body on GET) it shouldn't have produced such error. I am afraid it is affecting wider use-cases of the library.

I will be looking deeper into this.

Bilal2453 avatar Jan 05 '22 23:01 Bilal2453

coro-channel.lua#L126:

  local function wait()
    local thread = coroutine.running()
    return function (err)
      print("reaching assertResume")
      assertResume(thread, err)
    end
  end

coro-channel#L149:

    print("reaching socket:write")
    local success, err = socket:write(chunk, wait())
    if not success then
      closer.errored = err
      closer.check()
      print("reaching return")
      return nil, err
    end
    print("reaching coroutine.yield")
    err = coroutine.yield()
    return not err, err

Outputs:

reaching socket:write
reaching coroutine.yield
reaching assertResume
reaching socket:write
reaching assertResume
cannot resume running coroutine
stack traceback: etc...

So it seems like it is trying to resume before reaching yield (a trend at this point in many of the coro-* libs), although I am not sure why this happen when body is provided in the above scenario, probably something similar to what coro-split used to do, the fix can as well be the same one we did for coro-split implementing a lock on yield, and only resuming if it was locked.

Should this locking mechanism become part of assertResume/utils perhaps? it is not that often we see this error around but it is also not that rare.

Bilal2453 avatar Jan 05 '22 23:01 Bilal2453

$ luvit
Welcome to the Luvit repl!
> require'coro-http'.request("POST", "https://google.com", {}, "")
cannot resume running coroutine
stack traceback:

As I assumed, it was affecting a wider range. The cause is basically the body being too small, looks like it finish writing before yielding as explained before.

Will fix this in the same way we did it for coro-split.. tomorrow hopefully.

Edit: Well this was unexpected... After applying the following patch to coro-channel:

local function makeWrite(socket, closer)

  local hasYielded, hasReturned = false, false
  local function wait()
    local thread = coroutine.running()
    return function (err)
      if hasYielded then
        print("states: ", hasYielded, hasReturned)
        assertResume(thread, err)
        print("Resumed thread")
      else
        print("Skipped resuming, not yielded")
        hasReturned = err
      end
    end
  end

  local function write(chunk)
    hasYielded, hasReturned = false, false
    if closer.written then
      return nil, "already shutdown"
    end

    -- p("->", chunk)

    if chunk == nil then
      closer.written = true
      closer.check()
      local success, err = socket:shutdown(wait())
      if not success then
        return nil, err
      end
      err = coroutine.yield()
      return not err, err
    end

    local success, err = socket:write(chunk, wait())
    if not success then
      closer.errored = err
      closer.check()
      print("Write not successful")
      return nil, err
    end
    print("State before yielding:", hasYielded == true)
    if hasReturned then
      print("Returning without yielding")
      return not hasReturned, hasReturned
    else
      hasYielded = true
      print("got yielded")
      err = coroutine.yield()
      print("Returning after yielded")
      return not err, err
    end
  end

  return write
end

I tried again with the require'coro-http'.request("POST", "https://google.com", {}, ""), it doesn't error so that's good.. except it doesn't error. Just after printing Resumed thread Luvit panics and quit with no errors. I don't think this is caused by this patch, because I tried with latest coro-channel again, this time just changed assertResume to a regular coroutine.resume, and the same panic occurred. I suspect some of the recent changes broke this.

Edit 3: Nothing too serious apparently, I was just yielding when I should not be, making it yield forever, luv detects that and interpret it as "No job to be done, therefor exit" and this happen. The strace just reflected the cleaning job Luvit do before quitting. The patch should now be ready, it make a heavy use of states (3 states per wrapWrite) but I don't want to refactor the whole thing just for this, it should be sufficient.

Bilal2453 avatar Jan 06 '22 23:01 Bilal2453