warehouse icon indicating copy to clipboard operation
warehouse copied to clipboard

Test using redis as the celery broker

Open miketheman opened this issue 2 years ago • 6 comments

We're using Amazon SQS as our message broker today.

Celery support for SQS does not afford Monitoring nor Remote Control from within the context of Celery.

Missing monitor support means that the transport doesn’t implement events, and as such Flower, celery events, celerymon and other event-based monitoring tools won’t work. Remote control means the ability to inspect and manage workers at runtime using the celery inspect and celery control commands (and other tools using the remote control API). --https://docs.celeryq.dev/en/stable/getting-started/backends-and-brokers/index.html

We can monitor items like queue depth for SQS via Cloudwatch Metrics (and possibly in Datadog) but we have little visibility as to what is in the queue itself.

@ewdurbin and I recently chatted about the notion of queue debouncing/superseding, and I raised the question of whether we might want to use Redis as the broker instead. We already use Redis as part of the stack, and we could leverage the same cluster/instance for the Celery broker. This would do a couple of things:

  1. Remove reliance on a provider's secret sauce service in favor of a portable technology
  2. Increase visibility via tools like Flower
  3. ~Remove some extra dependencies needed for SQS like kombu~
  4. Increase usage of our current Redis cluster - both in terms of bandwidth and memory. TBD on how to look at this.

We also discussed using SQLAlchemy and Postgres as a backend, and rejected it. The docs kinda support that decision.

The change "seems" easy, since most of our setup is already governed by a config var of BROKER_URL: https://github.com/pypi/warehouse/blob/67d5b04228bd401eb75d29c3efec92383e51c9ad/warehouse/config.py#L180 https://github.com/pypi/warehouse/blob/38e0e0400f8585e382aa5d48836ef08fcfde742a/warehouse/tasks.py#L184

We already use Redis in aspects of our celery lifecycle: https://github.com/pypi/warehouse/blob/67d5b04228bd401eb75d29c3efec92383e51c9ad/warehouse/config.py#L181-L182

We could set celery.broker_url to REDIS_URL - but I'm noticing that this persists a pattern of dumping all of the keys into redis's default database db:0 - which would make selecting specifics keys harder, since everything would be intermingled. KEYS * can be an expensive operation (kinda like SELECT * FROM <<EVERYTHING>>)

We also share the same db:0 with oidc.jwk_cache_url, sessions.url, ratelimit.url, and warehouse.xmlrpc.cache.url - so that's not 100% awesome either.

Here's some semi-structured thoughts, totally open to more.

  • (Maybe?) Break apart the usage of REDIS_URL and create db-specific config variables for differently-scoped keys, like ratelimit, sessions, et al. Not totally sure how to manage the data migration yet.
  • Launch a set of workers with a config value of BROKER_URL pointing to a redis database to eat any new messages
  • Update the web/web-uploads config value of BROKER_URL pointing to redis
  • Wait for any outstanding SQS messages to be consumed and the queue emptied
  • Shut down the SQS-configured workers
  • Celebrate!

Obviously, I'm probably overlooking something, so would definitely want to hear other reasonings, opinions, thoughts, mistakes, etc.

miketheman avatar May 19 '23 22:05 miketheman