tutor
tutor copied to clipboard
Celery lms/cms-worker consumes too much RAM
Bug description
The normal execution of the celery workers, lms-worker and cms-worker services, consumes too much memory by default. Tested on my tutor local, but I'm assuming that with tutor dev is the same. By default the celery start 1 process by each CPU. If the deployed laptop or server has multiple CPU architecture, then the celery will launch multiple OS processes. Each celery OS process consumes memory. This makes more CPUs you have, more memory RAM the celery worker will consume. The problem that I see is that on a deployed server, the operator probably don't want this magic, and want to control how much celery OS process each container/pod will start and consequently how much memory each container/pods will consume.
How to reproduce
Run a tutor local environment and see how much RAM your celery workers is consuming. On 16 CPU server/laptop an idle environment each container lms/cms-worker consumes >2GB RAM per container.
If you add --concurrency=1 to the lms/cms-worker command an idle tutor local env uses <300MB per container.
Environment
tutor, version 14.2.3 But this applies also to newer version.
Solution (my opinion)
- Add a way to parameterize the concurrency on each worker, ex:
--concurrency={{ LMS_WORKER_CELERY_CONCURRENCY }} - Each celery worker by default should always consume the minimum memory, so it should defaults to just
1. - In my opinion there should be an easy way to customize both celery commands, lms-worker and cms-worker. Ex. adding the
--prefetch-multiplier=1 --without-gossip --without-mingle
Related PR: #1010
For more information regarding additional options, see: https://github.com/edx/configuration/pull/68
Hi Ivo, I agree that we need to provide a way to override the number of concurrent workers. And we also need to provide good defaults for that value.
In that context, I'm curious to have your take on this comment by @dkaliberda, where he makes the case that we should default to concurrency=1 and instead scale the number of replicas.
I'm not sure I agree with that comment, because in my experience there is quite a bit of overhead incurred by scaling celery workers horizontally (by increasing the number of replicas) as opposed to vertically (by increasing the number of workers). For instance, here are the figures for memory usage on my laptop, in idle mode:
- 1 replica x 1 worker: 319 MB
- 1 replica x 4 workers: 746 MB
- 4 replicas x 1 worker: 1280 MB
In your current use case, what would be the ideal number of replicas/workers? (both for LMS and CMS)
EDIT: I just learned about process autoscaling in Celery and I'm very tempted to use that as the default. Of course, we would still need to provide a mechanism to override that.
Hi @regisb, I think a default concurrency=1 is a good value. Nevertheless, I prefer that it should be parameterized. Currently, for our Kubernetes PROD deployment I have min 6 replicas, each with concurrency=1, but an horizontal auto scaler for 20. I have dedicated hardware so having 6 replicas always up is not a problem for us. Only sporadic is that it upscale up.
About the autoscaling in Celery, I also just found out it! I think I won't change my current setup, because it's just working. But if had found out before, I think I would be tempted to just use it. Even for our case just fixed 2 replicas with a vertical autoscale on Celery would be good option. The good news is that it would benefit everyone, docker compose or K8s installations.
--concurrency {{ OPENEDX_CMS_CELERY_WORKERS }} --autoscale={{ OPENEDX_CMS_CELERY_MAX_WORKERS }},{{ OPENEDX_CMS_CELERY_MIN_WORKERS }}
3 configurations with:
OPENEDX_CMS_CELERY_WORKERSdefault 1OPENEDX_CMS_CELERY_MAX_WORKERSdefault 1OPENEDX_CMS_CELERY_MIN_WORKERSdefault 1
An upgrade note on the docs could be added to configure a proper value of OPENEDX_CMS_CELERY_MAX_WORKERS, like number of CPUs
For example this is a snippet of my custom tutor plugin to override the workers:
apiVersion: apps/v1
kind: Deployment
metadata:
name: cms-worker
spec:
template:
spec:
terminationGracePeriodSeconds: 900
containers:
- name: cms-worker
args:
- celery
- --app=cms.celery
- worker
- --loglevel=info
- --concurrency=1
- --hostname=edx.cms.core.default.%%h
- --max-tasks-per-child=100
- --prefetch-multiplier=1
- --exclude-queues=edx.lms.core.default
- --without-gossip
- --without-mingle
The terminationGracePeriodSeconds is to prevent the Instructors export CSVs to be terminated during the export.
Recently, I had to add --without-gossip and --without-mingle to make the celery workers recover from a Redis downtime, https://github.com/celery/celery/discussions/7276
While I do think that administrators should be able to customise the default celery concurrency, I disagree that --concurrency=1 is the better default -- because it's much more expensive. According to my testing (see my comment here) running multiple replicas with a single process has a bigger overhead in terms of memory usage. (although I'm not sure exactly why).
I think that --autoscale=4,1 would be a good default for most people. We should be able to override this default setting using config patches or filters. Thus, I'm not inclined to merge #1010. I'll propose a different implementation.
I haven't looked yet at the other options that you are suggesting -- I'll investigate once I start working on this.
@regisb I'm working on improving celery defaults, here is the PR to include the without-gossip and without-single parameters and to increase the heartbeat interval from 2s to 60s. Are there any other defaults you want to check/research as part of the celery defaults?
https://github.com/eduNEXT/tutor/pull/3
Do we need any further action on this, considering https://github.com/overhangio/tutor/pull/1165 has been merged? Thanks
Yes, i would love to have --concurreny=X configurable.
Found a temporary fix by doing a plugin with the following content:
from tutor import hooks
hooks.Filters.CMS_WORKER_COMMAND.add_items(
[
"--concurrency=1"
]
)
hooks.Filters.LMS_WORKER_COMMAND.add_items(
[
"--concurrency=1"
]
)
Found a temporary fix by doing a plugin with the following content:
from tutor import hooks hooks.Filters.CMS_WORKER_COMMAND.add_items( [ "--concurrency=1" ] ) hooks.Filters.LMS_WORKER_COMMAND.add_items( [ "--concurrency=1" ] )
Yes, I have applied the same solution. We also have installed the https://github.com/eduNEXT/tutor-contrib-celery on our NAU platform.
Closing this issue as the filters added in https://github.com/overhangio/tutor/pull/1134 allow customizing the worker arguments, as mentioned in https://github.com/overhangio/tutor/issues/1126#issuecomment-2883132632.