jollygoodcode.github.io
jollygoodcode.github.io copied to clipboard
Optimum Sidekiq Configuration on Heroku with Puma
What's the optimum config for Sidekiq on Heroku with Puma?
There are quite a number of answers on the Internet, but nothing definitive, and most of them come with vague numbers and suggestions or are outdated.
Basically, these are the questions that are often asked:
- What do I exactly put in the
config/initializers/sidekiq.rbfile? - What should I set for client/server
size? - What should I set for client/server
concurrency? - How does the Puma workers and threads affect the Sidekiq settings?
- How does the number of Redis connections affect the Sidekiq settings?
- How does that number of web/worker dynos affect the Sidekiq settings?
The best (and updated) answers I can find are:
- Sidekiq, Heroku and Unicorn by @manuelvanrijn
- Sidekiq, Heroku and Puma by @bryanrite
With @bryanrite's post as a reference, this is our Sidekiq config:
config/initializers/sidekiq.rb
require 'sidekiq_calculations'
Sidekiq.configure_client do |config|
sidekiq_calculations = SidekiqCalculations.new
sidekiq_calculations.raise_error_for_env!
config.redis = {
url: ENV['REDISCLOUD_URL'],
size: sidekiq_calculations.client_redis_size
}
end
Sidekiq.configure_server do |config|
sidekiq_calculations = SidekiqCalculations.new
sidekiq_calculations.raise_error_for_env!
config.options[:concurrency] = sidekiq_calculations.server_concurrency_size
config.redis = {
url: ENV['REDISCLOUD_URL']
}
end
lib/sidekiq_calculations.rb
class SidekiqCalculations
DEFAULT_CLIENT_REDIS_SIZE = 2
DEFAULT_SERVER_CONCURRENCY = 25
def raise_error_for_env!
return if !Rails.env.production?
web_dynos
worker_dynos
max_redis_connection
rescue KeyError, TypeError # Integer(nil) raises TypeError
raise <<-ERROR
Sidekiq Server Configuration failed.
!!!======> Please add ENV:
- NUMBER_OF_WEB_DYNOS
- NUMBER_OF_WORKER_DYNOS
- MAX_REDIS_CONNECTION
ERROR
end
def client_redis_size
return DEFAULT_CLIENT_REDIS_SIZE if !Rails.env.production?
puma_workers * (puma_threads/2) * web_dynos
end
def server_concurrency_size
return DEFAULT_SERVER_CONCURRENCY if !Rails.env.production?
(max_redis_connection - client_redis_size - sidekiq_reserved) / worker_dynos / paranoid_divisor
end
private
def web_dynos
Integer(ENV.fetch('NUMBER_OF_WEB_DYNOS'))
end
def worker_dynos
Integer(ENV.fetch('NUMBER_OF_WORKER_DYNOS'))
end
def max_redis_connection
Integer(ENV.fetch('MAX_REDIS_CONNECTION'))
end
# ENV used in `config/puma.rb` too.
def puma_workers
Integer(ENV.fetch("WEB_CONCURRENCY", 2))
end
# ENV used in `config/puma.rb` too.
def puma_threads
Integer(ENV.fetch("WEB_MAX_THREADS", 5))
end
# https://github.com/mperham/sidekiq/blob/master/lib/sidekiq/redis_connection.rb#L12
def sidekiq_reserved
5
end
# This is added to bring down the value of Concurrency
# so that there's leeway to grow
def paranoid_divisor
2
end
end
The sidekiq_calculations.rb file is dependent on a number of ENV variables to work, so if you do scale your app (web or workers), do remember to update these ENVs:
MAX_REDIS_CONNECTIONNUMBER_OF_WEB_DYNOSNUMBER_OF_WORKER_DYNOS
At the same time, WEB_CONCURRENCY and WEB_MAX_THREADS should be the identical ENV variables used to set the number of Puma workers and threads in config/initializers/puma.rb.
Our puma.rb looks exactly like what Heroku has proposed.
The only difference to @bryanrite's calculation is that Sidekiq reserves 5 connections instead of 2 now
according to this line, and I have also added a paranoid_divisor to bring down the concurrency number and keep it below a 80% threshold.
Let me know how this config works for you. Would love to hear your feedback!
Thank you for reading.
@winston :pencil2: Jolly Good Code
About Jolly Good Code
We specialise in Agile practices and Ruby, and we love contributing to open source. Speak to us about your next big idea, or check out our projects.
@shinnyx @zamakkat @dtthaison @joshteng FYI.
@winston thx for sharing. it's helpful!
@winston thanks for the update :+1:
Hey @winston!
I thought I'd share our current configuration at Hired. We have pretty much the same stack! Heroku running Sidekiq (Pro) and Puma for the web server. I don't do any of the fancy dynamic sizing calculations that you do, we control everything by environment variables.
Here's our simplified Procfile:
web: puma -C config/puma.rb
worker: bundle exec sidekiq -c ${SIDEKIQ_CONCURRENCY:-5} -i ${DYNO:-1} -q <queue priorities>
clock: bundle exec clockwork Clockfile.rb
Puma config:
min_threads = Integer(ENV['PUMA_MIN_THREADS'] || 0)
max_threads = Integer(ENV['PUMA_MAX_THREADS'] || 3)
threads min_threads, max_threads
port Integer(ENV['PORT'] || 3000)
environment ENV['RACK_ENV']
activate_control_app
state_path 'tmp/puma.state'
if ENV['PUMA_WORKERS'].to_i > 1
workers ENV['PUMA_WORKERS']
preload_app!
on_worker_boot do
# Valid on Rails 4.1+ using the `config/database.yml` method of setting `pool` size
# https://devcenter.heroku.com/articles/deploying-rails-applications-with-the-puma-web-server#on-worker-boot
ActiveRecord::Base.establish_connection
ActiveRecord::Base.connection.execute('set statement_timeout to 10000')
end
end
on_restart do
Sidekiq.redis.shutdown { |conn| conn.close }
end
Sidekiq config:
Sidekiq.configure_server do |config|
config.redis = { url: ENV["REDIS_URL"], namespace: :resque }
config.reliable_fetch!
database_url = ENV['DATABASE_URL']
if database_url
ENV['DATABASE_URL'] = "#{database_url}?pool=250"
ActiveRecord::Base.establish_connection
end
$elastic = Elasticsearch::Client.new
Stretchy.client = $elastic
end
Sidekiq.configure_client do |config|
config.redis = { url: ENV["REDIS_URL"], namespace: :resque }
end
Sidekiq::Client.reliable_push! unless Rails.env.test?
We have found that Heroku's performance dynos are phenomenally more performant than the standard ones, and they come with tons of RAM so we can fit many copies of the app in memory. This allows me to use Puma's cluster mode and currently we run 12 Puma processes (PUMA_WORKERS) per dyno, each Puma using up to 3 threads (PUMA_MAX_THREADS). I probably could increase the workers significantly and still have enough memory. For Hired production we run 2 Performance-L dynos, and an additional 1 Performance-L for admins-only (our internal admin app).
For Sidekiq, I use cheaper Standard-2X dynos and have a Hubot script that auto scales-them based on queue depth. Currently I have SIDEKIQ_CONCURRENCY=5 and we always run at least 2 or 3 worker dynos and up to 12 at peak times.
We also run a single Clock dyno for scheduled jobs. Mostly these jobs just kick off other jobs or log stuff, so it's on a Standard-1X plan.
I hope this helps!
Hey @heythisisnate, this is awesome! Thanks for sharing! Good to know what hired is using.
I haven't had a chance to use Performance dynos yet because none of the apps I managed have reached that magnitude, but 2X have been a staple for us for quite a while now because we have noticed that the memory footprint of Ruby apps have grown quite a bit, and it hits 512MB limit (1X) pretty easily. For 2X (1GB) memory, we only turn up about 1-3 PUMA_WORKERS depending on the app, and sometimes there would be one or two gnarly memory leaks that we have to hunt down. Most of the time, we PUMA_MAX_THREADS as 5 though. Extrapolating that to 14GB Performance-L servers, numbers seem about right (our puma config is very much the same as yours).
RE: sidekiq, most of what I do is through ENVs as well, except that client_redis_size and server_concurrency_size are derived as formulas.
I noticed that you didn't set size under Sidekiq.configure_client though and from this article it seems to suggest that the number can/should be tweaked (otherwise I think it defaults to 25? - might be wasteful? or am I wrong?).
For Sidekiq.configure_server, we set the size via the formula here, instead of through the ENV variable (${SIDEKIQ_CONCURRENCY:-5}) so that I don't have to do the math every time I change certain values. Haha.
For ENV['DATABASE_URL'] = "#{database_url}?pool=250", if you are already using database.yml, I think the pool: 250 can go into database.yml` too.
Thoughts? Thanks for your reply! :bow:
:+1:
hi @winston, dividing by 2 in: puma_workers * (puma_threads / 2) * web_dynos could cause client_redis_size = 0 when puma_threads = 1
Btw, I am in use of redis-objects which need at least one connection per app instance (web). Is it better to share the connection pool between sidekiq and redis-objects or I should use another connection pool with size equal to client_redis_size?
@longkt90 Thanks for the feedback!
could cause client_redis_size = 0
That's true. Probably should modify the puma_threads method to be:
def puma_threads
[2, Integer(ENV.fetch("WEB_MAX_THREADS", 5))].max
end
So that the minimum is at least 2.
@longkt90 I haven't used redis-objects myself, but I would think this might be better?
use another connection pool with size equal to
client_redis_size
Yeah. That's what we are using. I dont think it's good idea to share the pool between sidekiq and redis-object On Fri, Dec 25, 2015 at 11:57 AM Winston [email protected] wrote:
@longkt90 https://github.com/longkt90 I haven't used redis-objects myself, but I would think this might be better?
use another connection pool with size equal to client_redis_size
— Reply to this email directly or view it on GitHub https://github.com/jollygoodcode/jollygoodcode.github.io/issues/12#issuecomment-167189653 .
Btw I need to set our db_pool to be redis server concurrency + puma-max-threads. Is that correct? On Fri, Dec 25, 2015 at 12:07 PM Nguyễn Thanh Long [email protected] wrote:
Yeah. That's what we are using. I dont think it's good idea to share the pool between sidekiq and redis-object On Fri, Dec 25, 2015 at 11:57 AM Winston [email protected] wrote:
@longkt90 https://github.com/longkt90 I haven't used redis-objects myself, but I would think this might be better?
use another connection pool with size equal to client_redis_size
— Reply to this email directly or view it on GitHub https://github.com/jollygoodcode/jollygoodcode.github.io/issues/12#issuecomment-167189653 .
Btw I need to set our db_pool to be redis server concurrency + puma-max-threads.
Yes that's right. I just set it as the max number of connection that my DB allows.
:+1: Thank you
It seems to me that this code:
config.redis = {
url: ENV['REDISCLOUD_URL'],
size: sidekiq_calculations.client_redis_size
}
And this code seems to have an issue:
def client_redis_size
return DEFAULT_CLIENT_REDIS_SIZE if !Rails.env.production?
puma_workers * (puma_threads/2) * web_dynos
end
It should just be:
def client_redis_size
return DEFAULT_CLIENT_REDIS_SIZE if !Rails.env.production?
puma_threads/2
end
puma_workers and web_dynos are not relevant for the connection pool the pool is shared only within the process and puma workers and dynos are separate process to each other.
The server_concurrency_size is OK because if you ran bellow CONCURRENCY + 2 sidekiq will fail to start, and will not heart much.
However after changing client_redis_size you do need to modify server_concurrency_size to use puma_workers and web_dynos.
You probably didn't get any error because you just reserved bigger pool size to client than you really needed.
@nitzanav The code was transcribed following the explanations detailed in http://bryanrite.com/heroku-puma-redis-sidekiq-and-connection-limits/.
Sidekiq config is often a mystery (to me) and sometimes difficult to know what's exactly "right". The config above has at least worked in all the apps I deployed so far. But do let me know your mileage on the updated config. I am sure it will be a good data point too. Thanks!
@winston This blog indeed shows this formula, but the usage of it as is, is meant in order to be bale to infer the maximum connections to be expected and configured on the redis server side, rather than the ruby application side. The ruby application side is a bit different, and AFAIK should be used as I described.
What you did will work but is no optimum :)
@winston at the time of auto scale how we will update - NUMBER_OF_WEB_DYNOS
- NUMBER_OF_WORKER_DYNOS can anyone tell me please
@winston I realize this is a couple years old now but I'm only now learning about and using Puma and concurrency so this may still be relevant to anyone else. To help clarify what @nitzanav is saying, I think you're misinterpreting what the blog post is saying in regards to setting the Redis client size. In fact, in the blog Bryan Rite has the following code for setting the size:
Sidekiq.configure_client do |config|
config.redis = { size: 3, url: ENV["REDIS_URL"], namespace: "your-app" }
end
These sidekiq settings will configure the size per worker process, so you don't need to factor in the puma_workers * or * web_dynos operations. The puma_workers * (puma_threads/2) * web_dynos formula is just telling you what the expected total number of connections will be if your app dynos were to fully utilize each worker and thread.
Hope this helps!
