ruby-mqtt icon indicating copy to clipboard operation
ruby-mqtt copied to clipboard

Prevent deadlock due to system call error

Open leoarnold opened this issue 2 years ago • 3 comments

The MQTT client runs in the main thread which consumes incoming MQTT messages from a Thread::Queue called @read_queue. This queue is fed by the @read_thread, a child thread which reads from a socket in an infinite loop.

We noticed that our application stopped processing incoming MQTT messages, but did not seem to exit or throw an exception either. Upon inspecting the logs, we saw that the @read_thread had crashed due to an unhandled Errno::ECONNRESET while reading from the socket. This had the consuming thread then sleep forever while waiting for new meassages on the @read_queue.

It turned out that the MQTT::Client#receive_packet method had appropriate error handling in place

def receive_packet
  # ...
rescue Exception
  # ...
end

but this did not rescue Errno::ECONNRESET even though Exception is at the top of the hierarchy of Ruby's built-in exception classes.

The root cause was that the library also defines a class MQTT::Exception and in the context of #receive_packet the constant Exception refers only to MQTT::Exception (which does not cover Errno::ECONNRESET) where it actually should rescue any subclass of ::Exception.

leoarnold avatar Sep 21 '23 12:09 leoarnold

@phlegx FYI

leoarnold avatar Sep 21 '23 12:09 leoarnold

I'm running into a threaded processing lockup that seems to be a similar issue

blmundie avatar Nov 16 '23 16:11 blmundie

Any follow up on getting this merged? This was a real problem for us a few months ago.

We found setting Thread.current.abort_on_exception = true would at least kill the listening process for us (and allow our system to reboot and rebuild the connection), but this fix is much cleaner and less janky.

bmorrall avatar Mar 05 '24 07:03 bmorrall

Thanks for this. The exception code has been wrong for a very long time and it is long overdue getting sorted out.

According to Semver, do you think this change is worthy of a Major, Minor or Patch increment? 🤔

njh avatar Apr 02 '24 22:04 njh

Hey @njh, good to see this progressing!

Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable. https://semver.org/#spec-item-4

Since you're just merging fixes (I assume) I'd argue that is only a PATCH increment. If you're also taking #112, that one mandates a cautionary note in CHANGELOG.md.

leoarnold avatar Apr 02 '24 23:04 leoarnold