net-protocol
net-protocol copied to clipboard
Fix Net::ReadTimeout errors in net-pop by handling multi-byte terminators getting split up in readuntil
Hi there. We recently did a Ruby update in our system. Part of that was using newer net-pop
and net-protocol
versions. After updating, we started seeing Net::ReadTimeout errors about once every day or two when reading POP messages and sometimes when reading the list of messages. net-pop
would tend to get stuck on a message but would sometimes also succeed on later attempts.
The issue appears to be in optimizations made in readuntil
for an offset. It appears the offset is used to avoid searching for a terminator at the beginning of the relevant buffer start. The optimization here sets offset to rbuf.bytesize
whereas before the offset was always the starting @rbuf_offset
. However, this doesn't always work when a multiple byte terminator is used like net-pop
's "\r\n" as a read could end up with the read ending midway through a multiple byte terminator and then either:
- Two chunks are read as one, so
each_message_chunk
fails to find the exact string ".\r\n" as two chunks are together. - Potentially the ending ".\r\n" could be read partially as well which would result in an extra read when no more data is coming.
I verified the first one in production by modifying each_message_chunk
like so:
def each_message_chunk
LOG 'reading message...'
LOG_off()
read_bytes = 0
while (line = readuntil("\r\n")) != ".\r\n"
warn "NP LINE: #{line}"
read_bytes += line.size
yield line.delete_prefix('.')
end
warn "NP EACH_MESSAGE_CHUNK END"
LOG_on()
LOG "read message (#{read_bytes} bytes)"
end
and verified this was the issue as the ending terminator was chunked together with the previous line:
NP LINE: 4E8=3D349" style=3D"display: none; width: 1px; height: 1px;">
NP LINE: </body>
NP LINE: </html>
NP LINE: =20
NP LINE:
NP LINE: --52f7naosv66--
.
The fix I came up with was to set the offset where it does not go too far into the buffer where the terminator could start. We have been running our fork with the potential fix for two weeks in production and have had no more Net::ReadTimeout issues in net-pop
.
I looked and it appears everything else using readuntil
uses single byte terminators like "\n" which wouldn't have been affected by this issue.
Relevant stacktrace details:
Apr 27 04:51:19Z APP app/single_threaded_worker.1 Net::ReadTimeout: Net::ReadTimeout with #<Socket:fd 18>
Apr 27 04:51:19Z APP app/single_threaded_worker.1 /app/vendor/bundle/ruby/3.1.0/gems/net-protocol-0.2.2/lib/net/protocol.rb:229:in `rbuf_fill'
Apr 27 04:51:19Z APP app/single_threaded_worker.1 /app/vendor/bundle/ruby/3.1.0/gems/net-protocol-0.2.2/lib/net/protocol.rb:199:in `readuntil'
Apr 27 04:51:19Z APP app/single_threaded_worker.1 /app/vendor/bundle/ruby/3.1.0/gems/net-protocol-0.2.2/lib/net/protocol.rb:377:in `each_message_chunk'
Apr 27 04:51:19Z APP app/single_threaded_worker.1 /app/vendor/bundle/ruby/3.1.0/gems/net-pop-0.1.2/lib/net/pop.rb:958:in `block in retr'
Apr 27 04:51:19Z APP app/single_threaded_worker.1 /app/vendor/bundle/ruby/3.1.0/gems/net-pop-0.1.2/lib/net/pop.rb:1016:in `critical'
Apr 27 04:51:19Z APP app/single_threaded_worker.1 /app/vendor/bundle/ruby/3.1.0/gems/net-pop-0.1.2/lib/net/pop.rb:956:in `retr'
Apr 27 04:51:19Z APP app/single_threaded_worker.1 /app/vendor/bundle/ruby/3.1.0/gems/net-pop-0.1.2/lib/net/pop.rb:810:in `pop'