slack-ruby-bot-server icon indicating copy to clipboard operation
slack-ruby-bot-server copied to clipboard

How to scale slack-bot-server?

Open keremtiryaki opened this issue 9 years ago • 9 comments

If i need to run thousands of bot for thousands of teams. can i use slack-bot-server?

keremtiryaki avatar Mar 11 '16 16:03 keremtiryaki

This has been talked about in https://github.com/dblock/slack-gamebot/issues/81, which is entirely based on this code. I think you can do low thousands today, however there're two known issues, possibly not that problematic.

  • It takes time to establish a websocket connection, so startup time starts to become noticeable around 100 bots. Note that http://playplay.io now has 260, and works well, but you cannot instantly re-establish all these connections on a single node. It takes 30 seconds to restart.
  • RAM becomes a problem, currently at 260 bots we're looking at 250MB in RAM which doesn't seem like that much, however that can grow very fast if you have very large teams and you need the data in a local store (all the data is downloaded on rtm.start).

To solve this we need a solution to horizontally scale the bots. The easiest way would be to load-balance them across multiple nodes. That would need to be implemented, but I would start with #3, first.

dblock avatar Mar 11 '16 16:03 dblock

I'd add that to run a web server that accepts multiple connections, it's good to split them out into separate processes. I have a Procfile setup that spawns a web proc with multiple Unicorn children and one worker proc with a single thread for the bots:

web: env WEB_ONLY=1 bundle exec unicorn -p $PORT -c config/unicorn.rb -E $RACK_ENV
worker: env BOT_ONLY=1 bundle exec unicorn -p $PORT -E $RACK_ENV

benjaminjackson avatar Jul 18 '17 21:07 benjaminjackson

The problem with this is that a service needs to expose an endpoint for registration. When that happens you need to start a bot instance. I guess it's ok that the WEB_ONLY part starts that bot for the time being, but it's still not ideal.

dblock avatar Jul 19 '17 00:07 dblock

Hi. I have the exact same issue. Our memory consumption on our web dyno is growing and I am trying to extract the bots to a worker. Did someone find a solution for this issues?

Thank you very much in advance

benbachhuber avatar Dec 12 '17 12:12 benbachhuber

So I'm currently testing out a multi-bot approach that overrides SlackBotRubyServer::Service start! and start_from_database! to do the following:

  • upon boot, grab Team.active.where(server_id: nil, is_admin: true).where.not(bot_token: nil).limit(ENV['SLACK_MAX_TEAM_COUNT']).lock(true) and walk through each running callbacks and start! (and setting server_id). This ensures each worker starts a set number of distinct teams.
  • after boot, subscribe to an SQS queue for TeamAdded events. Only one bot worker can dequeue and handle adding the Team.
  • after boot, subscribe to a Kinesis stream for Service events, such as rebooting (team removed), so that each worker can notify the teams it's handling.

@dblock I think I'm ready to show you what I have ;-)

alexagranov avatar Dec 13 '17 03:12 alexagranov

While that may work, I suspect there's going to be a lot of edge cases. Of course you should show us whatever you have and PR improvements that make it possible/easier into this lib.

Stepping back, I'd like to see an interface in slack-ruby-bot-server that abstracts the whole distribution mechanism away, so that we can plug SQS or whatever other queue. Load balancing and such are all common problems in distributed systems like zookeeper, so I think it's best to find something that works out of the box instead of reinventing the wheel.

dblock avatar Dec 17 '17 04:12 dblock

@alexagranov Sounds great. I am curious :-)

benbachhuber avatar Dec 17 '17 11:12 benbachhuber

@dblock - true enough. I neglected to mention though that the aim of my approach is to segment team-specific traffic to a specific bot(s) and not actually to load-balance - keeping it simple at first. I see potential issues with a federated set of bot workers having to coordinate which one gets to update the Slack workspace with a post, for instance. I do think something like zookeeper would be useful once a particular team's size (or SLA) dictates multiple bot workers to share the load.

alexagranov avatar Dec 17 '17 20:12 alexagranov

oh, and there's also the issue of multiple bot workers per team: if each bot worker is using the same bot token, I believe I've seen Slack broadcast the same user input to all connected realtime clients. Could probably stand to redo that experiment though...

alexagranov avatar Dec 18 '17 04:12 alexagranov