mastodon-on-aws Enable auto-scaling for web and streaming API

Evaluate and implement auto-scaling for ECS services web and streaming API.

Nov 16 '22 20:11 andreaswittig

+1 for this feature

(some documentation of best practices on manual scale up process would be nice too)

Nov 24 '22 16:11 scrappydog

Based on several days of working with the three services, one can do an HA and auto-scaling configuration out of the box if one sets AutoScaling to true, and sets the DesiredCount, MaxCapacity, and MinCapacity. The only service that doesn't scale well is the sidekiq service. According to this page https://docs.joinmastodon.org/admin/scaling/#sidekiq, you can have multiple sidekiq services on different queues, except for the scheduler queue. There can only be one of those. My fork has a few of these changes already in the istoleyourpw-deploy branch: https://github.com/compuguy/mastodon-on-aws

Edit: Came across this article (https://nora.codes/post/scaling-mastodon-in-the-face-of-an-exodus/), it explains how to split up the sidekiq tasks. Can have multiple instances with the default, push, and pull queues, and have one instance for mailer and scheduler.

Nov 25 '22 02:11 compuguy

My Sidekiq task is regularly pegging at 100% CPU utilization... definitely need some guidance on configuring scaling...

Nov 29 '22 18:11 scrappydog

@scrappydog Same for us. I'm not sure if that is an issue. It likely doesn't matter if the background tasks utilize all resources as long as they finish withou much delay. For us, we see spikes to 100% but only for minutes. Do you see the same pattern? Screenshot 2022-11-28 at 09 42 10

Nov 29 '22 18:11 michaelwittig

That looks very similar to utilization on my instance.

My inner system admin really "wants" to add another task... but I agree as long as jobs are completing in a reasonable time it's not an immediate issue.

BUT we are running tiny instances for testing... we NEED a way to scale up... :-)

Nov 29 '22 18:11 scrappydog

I bumped the CPU allocation up on the Sidekiq task to CPU .5 vCPU | Memory 3 GB...

This feels happier for now... but it doesn't address the real scalability question...

Nov 29 '22 22:11 scrappydog

Upgraded about half way through this graph... definably a lot better!

Nov 30 '22 13:11 scrappydog

I opened up #20 for sidekiq. This issue is about auto-scaling for web and streaming API.

Enabling auto-scaling is not the big deal here. What we need is a good metric to trigger scale out/in. And we need a test workload to test tis with. I have no idea how we can simulate mastodon load. If anyone here is reading this running an instance with enough users to benefit rom auto-scaling please let us know.

Dec 02 '22 12:12 michaelwittig

Just add a relay server and you will have CPU load in a minute.

https://github.com/brodi1/activitypub-relays

Dec 04 '22 08:12 nodomain

I opened up #20 for sidekiq. This issue is about auto-scaling for web and streaming API.

Enabling auto-scaling is not the big deal here. What we need is a good metric to trigger scale out/in. And we need a test workload to test tis with. I have no idea how we can simulate mastodon load. If anyone here is reading this running an instance with enough users to benefit rom auto-scaling please let us know.

Yeah it's quite easy to autoscale the web and streaming API's. But for most people it's #20 that's more important since Sidekiq does most of the heavy lifting for Mastodon...

Dec 04 '22 20:12 compuguy

mastodon-on-aws mastodon-on-aws copied to clipboard

Enable auto-scaling for web and streaming API

mastodon-on-aws
mastodon-on-aws copied to clipboard