swift-nio
swift-nio copied to clipboard
restore test case from #164 that was modified
Motivation:
testNoChannelReadIfNoAutoRead
from #164 was modified to allow data to
be received through channelRead
for unclear reasons.
Modification:
Undo the modifications (apart from NIO 2 port).
Result:
Tighter test case.
ah of course, we have to consume everything from the kernel here and currently we deliver it through the pipeline even though we have .autoRead = false
... Maybe we should buffer in the Channel
instead...
Buffering in the Channel
is really tough, I think, because it forces the Channel
to have a lifecycle past the FD closure. Right now NIO is super aggressive at minimising the amount of time the Channel
lives in an unregistered state: changing that pattern here is going to be extremely brittle.
Buffering in the
Channel
is really tough, I think, because it forces theChannel
to have a lifecycle past the FD closure. Right now NIO is super aggressive at minimising the amount of time theChannel
lives in an unregistered state: changing that pattern here is going to be extremely brittle.
Agreed. The question is if that's good for the user... What's currently (without further ChannelHandler
s that buffer) is impossible to do is this:
- bootstrap
Channel
withautoRead
off - do some other stuff
- add handlers to
Channel
- start properly running by setting
autoRead
on
Right now, this will appear to work until we get an error/EOF (on Linux) from the other side between steps 1 and 3. In Netty this would work just fine.
I don't think it's the most important use-case and the way you & I coded (always expect channelRead
, buffer in ChannelHandler
if you have to) will still work of course. But it further lowers the value of the autoRead
setting in NIO. There's barely a situation where you can use it...
Yes, I agree, with the current model autoRead=false
is not very useful, but it's worse than that, because it means that read
is of limited utility as well.
I'm not opposed to us trying to fix it, but the fix is going to be very subtle and trigger a bunch of weird edge-case bugs.
Yes, I agree, with the current model
autoRead=false
is not very useful, but it's worse than that, because it means thatread
is of limited utility as well.
I see where you're coming from, and you're certainly not wrong. But given maxMessagesPerRead
, other handler which buffer/produce data, or even call context.read
, you'll need to expect channelRead
s anyway. So I think the situation is more defensible with autoRead=on
but that may be my personal interpretation. Certainly, I agree with you that it's somewhat odd that if you're the first handler and let no read
s pass that you may still get channelRead
s.
I'm not opposed to us trying to fix it, but the fix is going to be very subtle and trigger a bunch of weird edge-case bugs.
We're better equipped now than ever to fix this. With the SAL, we should be able to reliably trigger all the situations where this will matter. Of course, this would require surgery in many fairly brittle places in the code, so yeah, bugs are likely to slip in and remain there for a while...
But before even thinking about how we address this best, what do we think is the correct fix?
I'm convinced that we should free kernel resources ASAP and will EPOLLHUP
being unmaskable we kinda have to anyway. That however means that we would have to buffer in the Channel
. We would have to hold back:
- input closed events
- errors
- data
- channel lifecycle messages
Quite the change indeed. Other questions arise around what to do when we have pending input data & EOF and are writing (which may fail because we might not have an fd anymore).
I think it's fair to say that we have two options:
- we leave it as is and do "no interpretation" on what the kernel gives us. The only thing that NIO guarantees is that without a
read
going through, the amount of data that gets pumped through the pipeline withchannelRead
is bounded. - we start creating an idealised world and do whatever it takes to keep the illusion up, even if the kernel can't be configured to match our idealised world.
EPOLLHUP
is rather unhelpful if we were to do (2), see man page :
EPOLLHUP Hang up happened on the associated file descriptor. epoll_wait(2) will always wait for this event; it is not necessary to set it in events when calling epoll_ctl(). Note that when reading from a channel such as a pipe or a stream socket, this event merely indicates that the peer closed its end of the channel. Subsequent reads from the channel will return 0 (end of file) only after all outstanding data in the channel has been consumed.
Of course, we can't mask EPOLLHUP
and we can't register this fd
for anything anymore because we'd get EPOLLHUP
s forever.
What we could in theory do would be:
- On
EPOLLHUP
we deregister the fd from the kernel entirely. We do not read immediately, and can't register for writes anymore either - We save the error we possibly got from
getsockopt(..., SO_ERROR)
- When we get writes from the user we fail them (or try them once)
- Once the user asks us to, we'll
read
- Once we see the EOF (or an error) from read we surface that to the user and close the
Channel
.
This would keep kernel resources around for longer than necessary which I don't like but it's the only way that I can see which would allow us to beat Linux hard enough to match the idealised world we may want.
Just throwing in a short comment: wouldn't the channel (or perhaps more correctly, the EL given our earlier discussion on memory ownership) need to buffer in the future regardless (I.e. ET / io_uring). Perhaps worthwhile looking more on that model a bit first, as any idealized current desired model might run into trouble (as we recently saw with the implicit sync points).
I see where you're coming from, and you're certainly not wrong. But given maxMessagesPerRead, other handler which buffer/produce data, or even call context.read, you'll need to expect channelReads anyway.
I think a well-functioning NIO program would not produce a channelReadComplete
without a read
call. This is not to say that channel handlers can't mess that up, but channel handlers can mess up all kinds of things and we generally just discuss things as though they won't.
Just throwing in a short comment: wouldn't the channel (or perhaps more correctly, the EL given our earlier discussion on memory ownership) need to buffer in the future regardless (I.e. ET / io_uring).
I think it's relevant to consider that we'll probably have to end up with two separate solutions. One will need to cover epoll, where the FD absolutely must be deregistered once EPOLLHUP has been received. This is separate from what io_uring will want to do. Additionally, @weissi's proposed method of handling this (delay issuing read
calls until later) is going to need to behave differently with io_uring
as well. Importantly with io_uring
doing our I/O we may not find out about the EPOLLHUP condition until we try to read: after all, if we aren't writing and aren't reading then we aren't really asking for any information at all from the peer FD.
@weissi Your proposed design for dealing with this is fairly reasonable. It does interact poorly with our registered
/unregistered
events, but frankly those are already of extremely limited utility and arguably should just go away entirely.
The general idea of "this FD is dead, let's stop asking the EL about it and assume that the FD is always readable/writable" seems like a reasonable strategy to me.
@weissi Your proposed design for dealing with this is fairly reasonable. It does interact poorly with our
registered
/unregistered
events, but frankly those are already of extremely limited utility and arguably should just go away entirely.
I see what you mean and I agree with the "limited utility". If we adopted what I proposed, we could even argue that the unregistered
becomes more useful. Clearly, there's a reason why NIO deregistered the fd from the Selector, this could be a signal to the user to read
to find out why :). And the guarantee that the user would get is that after unregistered
, there won't be an infinite amount of data coming anymore.
I think a well-functioning NIO program would not produce a
channelReadComplete
without aread
call. This is not to say that channel handlers can't mess that up, but channel handlers can mess up all kinds of things and we generally just discuss things as though they won't.
Right, makes sense.
I see what you mean and I agree with the "limited utility". If we adopted what I proposed, we could even argue that the unregistered becomes more useful. Clearly, there's a reason why NIO deregistered the fd from the Selector, this could be a signal to the user to read to find out why :). And the guarantee that the user would get is that after unregistered, there won't be an infinite amount of data coming anymore.
Yeah, the issue is that while this is a very nice story about how unregistered
works, unregistered
is right now strictly ordered with channelInactive
. This would break that strict ordering.
I'm not sure how much that matters, particularly, as once again I don't think anyone is actually using unregistered
(not least because I have no idea what they'd use it for). Perhaps the better move is to say that registered
/unregistered
are actually useless, deprecate them, and just leave them as vestigial nothings.
Yeah, the issue is that while this is a very nice story about how
unregistered
works,unregistered
is right now strictly ordered withchannelInactive
. This would break that strict ordering.
Point taken, that's true. Currently we do inactive
, then unregistered
. I don't honestly know why :).
I'm not sure how much that matters, particularly, as once again I don't think anyone is actually using
unregistered
(not least because I have no idea what they'd use it for). Perhaps the better move is to say thatregistered
/unregistered
are actually useless, deprecate them, and just leave them as vestigial nothings.
Fair enough, that'd work for me too I think.