tutor Celery lms/cms-worker consumes too much RAM

Bug description

The normal execution of the celery workers, lms-worker and cms-worker services, consumes too much memory by default. Tested on my tutor local, but I'm assuming that with tutor dev is the same. By default the celery start 1 process by each CPU. If the deployed laptop or server has multiple CPU architecture, then the celery will launch multiple OS processes. Each celery OS process consumes memory. This makes more CPUs you have, more memory RAM the celery worker will consume. The problem that I see is that on a deployed server, the operator probably don't want this magic, and want to control how much celery OS process each container/pod will start and consequently how much memory each container/pods will consume.

How to reproduce

Run a tutor local environment and see how much RAM your celery workers is consuming. On 16 CPU server/laptop an idle environment each container lms/cms-worker consumes >2GB RAM per container.

If you add --concurrency=1 to the lms/cms-worker command an idle tutor local env uses <300MB per container.

Environment

tutor, version 14.2.3 But this applies also to newer version.

Solution (my opinion)

Add a way to parameterize the concurrency on each worker, ex: --concurrency={{ LMS_WORKER_CELERY_CONCURRENCY }}
Each celery worker by default should always consume the minimum memory, so it should defaults to just 1.
In my opinion there should be an easy way to customize both celery commands, lms-worker and cms-worker. Ex. adding the --prefetch-multiplier=1 --without-gossip --without-mingle

Sep 27 '24 16:09 igobranco

Related PR: #1010

For more information regarding additional options, see: https://github.com/edx/configuration/pull/68

Sep 30 '24 09:09 regisb

Hi Ivo, I agree that we need to provide a way to override the number of concurrent workers. And we also need to provide good defaults for that value.

In that context, I'm curious to have your take on this comment by @dkaliberda, where he makes the case that we should default to concurrency=1 and instead scale the number of replicas.

I'm not sure I agree with that comment, because in my experience there is quite a bit of overhead incurred by scaling celery workers horizontally (by increasing the number of replicas) as opposed to vertically (by increasing the number of workers). For instance, here are the figures for memory usage on my laptop, in idle mode:

1 replica x 1 worker: 319 MB
1 replica x 4 workers: 746 MB
4 replicas x 1 worker: 1280 MB

In your current use case, what would be the ideal number of replicas/workers? (both for LMS and CMS)

EDIT: I just learned about process autoscaling in Celery and I'm very tempted to use that as the default. Of course, we would still need to provide a mechanism to override that.

Sep 30 '24 13:09 regisb

Hi @regisb, I think a default concurrency=1 is a good value. Nevertheless, I prefer that it should be parameterized. Currently, for our Kubernetes PROD deployment I have min 6 replicas, each with concurrency=1, but an horizontal auto scaler for 20. I have dedicated hardware so having 6 replicas always up is not a problem for us. Only sporadic is that it upscale up.

About the autoscaling in Celery, I also just found out it! I think I won't change my current setup, because it's just working. But if had found out before, I think I would be tempted to just use it. Even for our case just fixed 2 replicas with a vertical autoscale on Celery would be good option. The good news is that it would benefit everyone, docker compose or K8s installations. --concurrency {{ OPENEDX_CMS_CELERY_WORKERS }} --autoscale={{ OPENEDX_CMS_CELERY_MAX_WORKERS }},{{ OPENEDX_CMS_CELERY_MIN_WORKERS }}

3 configurations with:

OPENEDX_CMS_CELERY_WORKERS default 1
OPENEDX_CMS_CELERY_MAX_WORKERS default 1
OPENEDX_CMS_CELERY_MIN_WORKERS default 1

An upgrade note on the docs could be added to configure a proper value of OPENEDX_CMS_CELERY_MAX_WORKERS, like number of CPUs

For example this is a snippet of my custom tutor plugin to override the workers:

apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: cms-worker
    spec:
      template:
        spec:
          terminationGracePeriodSeconds: 900
          containers:
            - name: cms-worker
              args:
                - celery
                - --app=cms.celery
                - worker
                - --loglevel=info
                - --concurrency=1
                - --hostname=edx.cms.core.default.%%h
                - --max-tasks-per-child=100
                - --prefetch-multiplier=1
                - --exclude-queues=edx.lms.core.default
                - --without-gossip
                - --without-mingle

The terminationGracePeriodSeconds is to prevent the Instructors export CSVs to be terminated during the export. Recently, I had to add --without-gossip and --without-mingle to make the celery workers recover from a Redis downtime, https://github.com/celery/celery/discussions/7276

Oct 01 '24 11:10 igobranco

While I do think that administrators should be able to customise the default celery concurrency, I disagree that --concurrency=1 is the better default -- because it's much more expensive. According to my testing (see my comment here) running multiple replicas with a single process has a bigger overhead in terms of memory usage. (although I'm not sure exactly why).

I think that --autoscale=4,1 would be a good default for most people. We should be able to override this default setting using config patches or filters. Thus, I'm not inclined to merge #1010. I'll propose a different implementation.

I haven't looked yet at the other options that you are suggesting -- I'll investigate once I start working on this.

Oct 01 '24 12:10 regisb

@regisb I'm working on improving celery defaults, here is the PR to include the without-gossip and without-single parameters and to increase the heartbeat interval from 2s to 60s. Are there any other defaults you want to check/research as part of the celery defaults?

https://github.com/eduNEXT/tutor/pull/3

Nov 21 '24 21:11 Ian2012

Do we need any further action on this, considering https://github.com/overhangio/tutor/pull/1165 has been merged? Thanks

Jan 06 '25 13:01 DawoudSheraz

Yes, i would love to have --concurreny=X configurable.

May 14 '25 14:05 FlorianO

Found a temporary fix by doing a plugin with the following content:

from tutor import hooks


hooks.Filters.CMS_WORKER_COMMAND.add_items(
    [
        "--concurrency=1"
    ]
)

hooks.Filters.LMS_WORKER_COMMAND.add_items(
    [
        "--concurrency=1"
    ]
)

May 15 '25 09:05 FlorianO

Found a temporary fix by doing a plugin with the following content:

from tutor import hooks


hooks.Filters.CMS_WORKER_COMMAND.add_items(
    [
        "--concurrency=1"
    ]
)

hooks.Filters.LMS_WORKER_COMMAND.add_items(
    [
        "--concurrency=1"
    ]
)

Yes, I have applied the same solution. We also have installed the https://github.com/eduNEXT/tutor-contrib-celery on our NAU platform.

May 15 '25 15:05 igobranco

Closing this issue as the filters added in https://github.com/overhangio/tutor/pull/1134 allow customizing the worker arguments, as mentioned in https://github.com/overhangio/tutor/issues/1126#issuecomment-2883132632.

Sep 02 '25 16:09 DawoudSheraz

tutor tutor copied to clipboard

Celery lms/cms-worker consumes too much RAM

tutor
tutor copied to clipboard