slack-ruby-bot icon indicating copy to clipboard operation
slack-ruby-bot copied to clipboard

Duplicate responses to commands after bot is left idling w/ celluloid

Open jillguyonnet opened this issue 6 years ago • 15 comments

Hi there,

We are using this gem in a simple project that uses a database. We've noticed that when we let the app idle for some time (unsure exactly for how long, maybe about an hour) then the bot processes text commands multiple times. The gems slack-ruby-bot and slack-ruby-client are on the latest version.

I have created a minimal setup with which I am able to reproduce this issue: https://github.com/jillguyonnet/slack_ruby_bot_minimal. This is a very simple app that manages a list of items. Note that it was based on https://github.com/slack-ruby/slack-ruby-bot/blob/master/TUTORIAL.md and uses the MVC approach. The available commands are:

  • hi
  • show: print a list of items
  • add: add an item to the list
  • clear: clear the list

Similar to our real project, the issue of multiple command processing happens when I let the app run for a while. Here is a screenshot where the bot responds twice to hi: Screenshot 2019-09-09 at 16 33 58

The steps to reproduce are:

  1. Run the app, check that it works by running some commands.
  2. Let it running for a while (at least an hour but possibly several hours).
  3. Run some commands again. More often than not (but not always), the issue will present itself.

We are currently investigating potential sources for this issue. Without confirmation, we think these might include:

  • websocket_ping setting on the realtime client (https://github.com/slack-ruby/slack-ruby-client/blob/master/README.md#configuring-slackrealtimeclient) currently not set to zero. Could this issue be caused by the client attempting to reestablish its connection to the message server?
  • The CONCURRENCY environment variable is not set.

Would you, based on this information, be able to suggest the cause of this issue and the best way to fix it?

Thanks

jillguyonnet avatar Sep 09 '19 15:09 jillguyonnet

same here! K3N_-_Kubernetes_Dashboard

hjanuschka avatar Sep 10 '19 11:09 hjanuschka

Without debugging I am going to guess that it's reconnecting, and re-registering commands when that happens. I'll take a look when I get a chance.

dblock avatar Sep 10 '19 12:09 dblock

yeah seems to be that way, i would love to have an option/ENV to disable reconnect et-all and just throw a exception and die! in our case the cluster would take care of the bot and recreate it anyway 👍

hjanuschka avatar Sep 10 '19 12:09 hjanuschka

if you can lead me to the file/place where the reconnect happens, i am happy to contribute such a change!

hjanuschka avatar Sep 10 '19 12:09 hjanuschka

Reconnect happens in slack-ruby-client with a ping worker. There's a lot of detail in https://code.dblock.org/2019/03/04/solving-slack-side-disconnects-in-slack-ruby-client.html with links. But I think the problem here is simpler and is something about the reconnect semantics changing over the last few versions of the client. I would add a bunch of logs to see what's being reloaded and when.

dblock avatar Sep 10 '19 12:09 dblock

changing to CONCURRENCY=async-websocket seems to fix it for my scenario, it reconnects multiple times but only a single bot is responding to messages

hjanuschka avatar Sep 12 '19 13:09 hjanuschka

Were you using celluloid before or faye-websocket?

dblock avatar Sep 13 '19 14:09 dblock

default, i think it was celluloid

hjanuschka avatar Sep 13 '19 14:09 hjanuschka

We're happy to report that after a week of testing using async-websocket and CONCURRENCY=async-websocket seems to have fixed the issue for us as well. 🎉

jillguyonnet avatar Sep 18 '19 09:09 jillguyonnet

I would appreciate if someone could get to the bottom of this with celluloid. Good project to dive deep!

dblock avatar Sep 22 '19 16:09 dblock

+1! Have also been experiencing this behavior after switching to celluloid-io.

If I have uninstalled celluloid-io and installed async-websocket again, do I still need to set CONCURRENCY=async-websocket or will async-websocket get picked up as a default?

oliverswitzer avatar Oct 15 '19 15:10 oliverswitzer

+1! Have also been experiencing this behavior after switching to celluloid-io.

If I have uninstalled celluloid-io and installed async-websocket again, do I still need to set CONCURRENCY=async-websocket or will async-websocket get picked up as a default?

If your Gemfile has async-websocket you're all good. The ENV setting is to run tests in this project.

dblock avatar Oct 15 '19 16:10 dblock

We've left our bot alone this weekend, and when coming back to work this monday, it starts replying 12 times for the same question 😱. Will try the async-websocket trick.

Startouf avatar Oct 21 '19 07:10 Startouf

@dblock thanks! Though I had to upgrade to Rails 6 because of a version conflict with, switching back to using async-websocket in my Gemfile seemed to work.

oliverswitzer avatar Oct 21 '19 18:10 oliverswitzer

I wish someone actually fixed the celluloid bug. Or maybe someone can do the work to deprecate celluloid usage everywhere and hardcode async-websocket?

dblock avatar Oct 22 '19 12:10 dblock