lit Hotfix for #303

This should fix #303 by tracking the state of the thread. We might want to patch and update the coro-channel package on Lit as fast as we can, I've seen so far 3-4 people complaining about this.

Jan 07 '22 19:01 Bilal2453

@squeek502 Can this be reviewed/merged quickly? It seems to be a major issue around

Jan 08 '22 19:01 Bilal2453

I'll need to look more at how this is happening, but would it just be easier to reject a body for GET requests? Maybe that would be breaking, though.

Jan 08 '22 19:01 SinisterRectus

I've explained well enough why this happen in #303, basically socket:write() finishes and executes the callback too quickly on an empty chunk when wrapped with secure-socket, the callback tries to resume but the catch is it haven't reached yield just yet.

Not sure what have changed exactly and triggered this fast callback execution, but there has always been a chance of it happening eventually -effectively we are always assuming the thread was yielded which isn't always true-.

We could mitigate this with just checking the body length, that still wouldn't be a fix + we don't know just yet what else this is affecting. That do seem like more trouble and not worth it quite honestly.

Edit: It is basically the same exact assumption coro-split used to make.

Jan 08 '22 20:01 Bilal2453

but would it just be easier to reject a body for GET requests

@SinisterRectus Oh didn't notice you've specified GET requests with this mitigation. I did mention this in the issue, but here again, this isn't only triggered by a GET request on coro-http end, it happens with basically ANYTHING that do make use of the coro-channel wrapWrite (at least when wrapped with secure-socket). So the following will error too coro_http.request("POST", "https://google.com", {}, ""), this is even more trouble since this request should be valid, and shouldn't be rejected client side at all. Sometimes you will find the library itself trying to write empty string as an EOF, this too is triggering this bug.

Jan 08 '22 20:01 Bilal2453

I was debugging an issue with TLS servers when stress tested with this fix applied, and I noticed write("") can still cause problems, more specifically this time it can lead to yielding the write and never resuming it. This is also for some reason very slow compared to a proper fix, at least 10 times slower.

since hasReturned can be assigned whatever value by the read, it can also be a falsy value.

A slightly better fix would to have something like:

local some_unique_value = {}
hasReturned = some_unique_value
-- ...
hasReturned = data
-- ...
if hasReturned == some_unique_value then
 -- we have resumed before the yield
else
 -- we should yield as write has not finished yet
end

also we can stop using the hasYielded upvalue and instead use coroutine.status(thread).

Mar 08 '23 19:03 Bilal2453