celery icon indicating copy to clipboard operation
celery copied to clipboard

Possible memory leak in the inspect API

Open dejlek opened this issue 4 years ago • 16 comments

Checklist

  • [x] I have verified that the issue exists against the master branch of Celery.
  • [ ] This has already been asked to the discussion group first.
  • [x] I have read the relevant section in the contribution guide on reporting bugs.
  • [x] I have checked the issues list for similar or identical bug reports.
  • [x] I have checked the pull requests list for existing proposed fixes.
  • [x] I have checked the commit log to find out if the bug was already fixed in the master branch.
  • [x] I have included all related issues and possible duplicate issues in this issue (If there are none, check this box anyway).

Mandatory Debugging Information

  • [x] I have included the output of celery -A proj report in the issue. (if you are not able to do this, then at least specify the Celery version affected).
  • [x] I have verified that the issue exists against the master branch of Celery.
  • [x] I have included the contents of pip freeze in the issue.
  • [x] I have included all the versions of all the external dependencies required to reproduce this bug.

Optional Debugging Information

  • [x] I have tried reproducing the issue on more than one Python version and/or implementation.
  • [ ] I have tried reproducing the issue on more than one message broker and/or result backend.
  • [ ] I have tried reproducing the issue on more than one version of the message broker and/or result backend.
  • [ ] I have tried reproducing the issue on more than one operating system.
  • [ ] I have tried reproducing the issue on more than one workers pool.
  • [ ] I have tried reproducing the issue with autoscaling, retries, ETA/Countdown & rate limits disabled.
  • [x] I have tried reproducing the issue after downgrading and/or upgrading Celery and its dependencies.

Related Issues and Possible Duplicates

Related Issues

  • None

Possible Duplicates

  • None

Environment & Settings

Celery version:

celery report Output:


Steps to Reproduce

Required Dependencies

  • Minimal Python Version: 3.6
  • Minimal Celery Version: 4.3.0
  • Minimal Kombu Version: N/A or Unknown
  • Minimal Broker Version: Redis 5.x
  • Minimal Result Backend Version: N/A or Unknown
  • Minimal OS and/or Kernel Version: N/A or Unknown
  • Minimal Broker Client Version: N/A or Unknown
  • Minimal Result Backend Client Version: N/A or Unknown

Minimal example:

Python Packages

pip freeze Output:

alembic==0.9.10
amqp==2.5.2
ansible==2.8.4
arrow==0.15.5
asn1crypto==0.24.0
Authlib==0.14.1
awscli==1.16.289
backcall==0.1.0
backoff==1.10.0
bcrypt==3.1.7
beautifulsoup4==4.8.2
billiard==3.5.0.5
blist==1.3.6
boto3==1.12.14
botocore==1.15.14
cachetools==3.1.1
celery==4.2.2
certifi==2019.11.28
cffi==1.14.0
chardet==3.0.4
cloudpickle==1.2.1
colorama==0.3.9
crc16==0.1.1
cryptography==2.8
cyordereddict==1.0.0
dask==2.10.1
decorator==4.4.0
deepdiff==3.3.0
docopt==0.6.2
docutils==0.15.2
dpkt==1.9.2
fsspec==0.6.2
greenlet==0.4.15
idna==2.9
importlib-metadata==1.5.0
ipython==7.8.0
ipython-genutils==0.2.0
jedi==0.15.1
Jinja2==2.11.1
jmespath==0.9.5
jsonpickle==1.3
kombu==4.6.0
kvdr==1.0.4
locket==0.2.0
lxml==4.5.0
lz4==3.0.2
Mako==1.1.2
MarkupSafe==1.1.1
msgpack==1.0.0
msgpack-python==0.5.6
numexpr==2.7.1
numpy==1.18.1
pandas==0.22.0
pandas-datareader==0.8.1
paramiko==2.7.1
parso==0.5.1
partd==1.1.0
pexpect==4.7.0
pickleshare==0.7.5
prompt-toolkit==2.0.9
psutil==5.7.0
psycopg2-binary==2.8.4
ptyprocess==0.6.0
pudb==2019.1
pyasn1==0.4.7
pycparser==2.20
pycrypto==2.6.1
pydocstyle==1.1.1
Pygments==2.4.2
Pympler==0.8
PyNaCl==1.3.0
pyOpenSSL==19.1.0
pysftp==0.2.9
python-dateutil==2.8.1
python-editor==1.0.4
python-gnupg==0.4.5
python-redis==0.2.2
python-snappy==0.5.4
pytz==2019.3
PyYAML==5.3
rarfile==3.1
redis==3.4.1
requests==2.23.0
requests-oauth2==0.3.0
rsa==3.4.2
s3fs==0.4.0
s3transfer==0.3.3
six==1.14.0
sortedcontainers==2.1.0
soupsieve==2.0
SQLAlchemy==1.3.13
tables==3.6.1
toolz==0.10.0
traitlets==4.3.2
urllib3==1.25.8
urwid==2.0.1
vine==1.3.0
wcwidth==0.1.7
wrapt==1.11.2
xlrd==1.2.0
zipp==3.1.0

Other Dependencies

N/A

Minimally Reproducible Test Case

    from myapp.app import app
    from time import sleep
    
    
    def print_stats():
        insp = app.control.inspect()
        active_lst = insp.active()
        cluster_stats = insp.stats()
        active_queues = insp.active_queues()
        all_stats = {
            "active": active_lst,
            "stats": cluster_stats,
            "queues": active_queues
        }
        print(all_stats)
    
    
    def main():
        while True:
            print_stats()
            sleep(10)
    
    
    if __name__ == '__main__':
        main()

Expected Behavior

No memory leaks

Actual Behavior

The memory consumption by the tiny example constantly grows. I left the script running over night and it always gets killed after trying to allocate more memory than the system has, so Linux automatically kills it.

dejlek avatar Mar 23 '20 19:03 dejlek

I have also tested the script with both CPython 3.6 and PyPy 7.3.0/3.6 and in both cases they leak memory.

dejlek avatar Mar 23 '20 19:03 dejlek

thanx for reporting.

auvipy avatar Mar 24 '20 07:03 auvipy

I have the same issue.

carantunes avatar Mar 25 '20 19:03 carantunes

Do you propose any alternatives for an healthcheck?

carantunes avatar Mar 25 '20 19:03 carantunes

What I do now is - I do not run it in an infinite loop. Instead I wrapped the Python code in a tiny BASH script that runs Python process every N seconds...

dejlek avatar Mar 26 '20 12:03 dejlek

Just making sure you're not running the script in DEBUG mode or similar? If django.

jheld avatar Mar 27 '20 00:03 jheld

I'm trying to use celery to run a pytorch model on GPU, I found that after the task was completed, celery does not release the resource occupied by the task (like GPU memory and RAM). Therefore, I tried to use CELERYD_MAX_TASKS_PER_CHILD setting to kill old worker and create a new one to release resource. However, I got some error message: 'cuda runtime error: Initialization error' and Process 'ForkPoolworker exited with exitcode 1' when the maximum number of task execution is reached and celery also didn't release memory. Is this can be called memory leaking? Maybe similar to your question. celery 4.4.0 redis 3.4.1 python 3.6.6

stillwaterman avatar Mar 27 '20 08:03 stillwaterman

No, no debug mode. No django. - Just plain celery.

dejlek avatar Mar 31 '20 14:03 dejlek

I'm trying to use celery to run a pytorch model on GPU, I found that after the task was completed, celery does not release the resource occupied by the task (like GPU memory and RAM). Therefore, I tried to use CELERYD_MAX_TASKS_PER_CHILD setting to kill old worker and create a new one to release resource. However, I got some error message: 'cuda runtime error: Initialization error' and Process 'ForkPoolworker exited with exitcode 1' when the maximum number of task execution is reached and celery also didn't release memory. Is this can be called memory leaking? Maybe similar to your question. celery 4.4.0 redis 3.4.1 python 3.6.6

you should try celery 4.4.2+

auvipy avatar Apr 01 '20 01:04 auvipy

No, no debug mode. No django. - Just plain celery.

can you try some tool to find out the root cause of memoery leak?

auvipy avatar Apr 01 '20 01:04 auvipy

I'm trying to use celery to run a pytorch model on GPU, I found that after the task was completed, celery does not release the resource occupied by the task (like GPU memory and RAM). Therefore, I tried to use CELERYD_MAX_TASKS_PER_CHILD setting to kill old worker and create a new one to release resource. However, I got some error message: 'cuda runtime error: Initialization error' and Process 'ForkPoolworker exited with exitcode 1' when the maximum number of task execution is reached and celery also didn't release memory. Is this can be called memory leaking? Maybe similar to your question. celery 4.4.0 redis 3.4.1 python 3.6.6

you should try celery 4.4.2+

I tried celery 4.4.2, sadly, it doesn't work. I have a question about why doesn't celery automatically release task resources after it has been completed?Is there a bug in code? I'm not familiar with celery.

stillwaterman avatar Apr 01 '20 07:04 stillwaterman

can you try some tool to find out the root cause of memoery leak?

That is one of the first things I tried. I used couple of memory profilers. Problem is that they just tell you the amount of memory allocated by certain type. In this particular case the leak is in some sort of list. But it is very had to find which list...

dejlek avatar Apr 03 '20 12:04 dejlek

I was unable to reproduce this issue with this example using redis-py and this example using py-amqp. I'm using what's currently on the celery master branch.

Do I need to be actively running tasks to make the leak happen?

pawl avatar Dec 26 '21 00:12 pawl

I let the example in my previous comment run for a few hours, and I do see a minor leak now:

celery-stats_1  | Top 10 lines
celery-stats_1  | #1: <frozen importlib._bootstrap_external>:525: 1702.8 KiB
celery-stats_1  | #2: /celery_app/kombu/kombu/pidbox.py:234: 1254.8 KiB
celery-stats_1  |     f'{oid}.{self.reply_exchange.name}',
celery-stats_1  | #3: /usr/local/lib/python3.7/linecache.py:137: 1162.9 KiB
celery-stats_1  |     lines = fp.readlines()
celery-stats_1  | #4: /usr/local/lib/python3.7/uuid.py:269: 1015.8 KiB
celery-stats_1  |     hex[:8], hex[8:12], hex[12:16], hex[16:20], hex[20:])
celery-stats_1  | #5: /usr/local/lib/python3.7/site-packages/pympler/summary.py:132: 524.2 KiB
celery-stats_1  |     rows.append([otype, count[otype], total_size[otype]])
celery-stats_1  | #6: /usr/local/lib/python3.7/site-packages/cached_property.py:74: 330.9 KiB
celery-stats_1  |     return obj_dict.setdefault(name, self.func(obj))
celery-stats_1  | #7: /celery_app/redis-py/redis/connection.py:1294: 318.7 KiB
celery-stats_1  |     self._in_use_connections.add(connection)
celery-stats_1  | #8: /celery_app/redis-py/redis/commands/core.py:2362: 318.7 KiB
celery-stats_1  |     return self.execute_command("SADD", name, *values)
celery-stats_1  | #9: /usr/local/lib/python3.7/site-packages/pympler/summary.py:88: 305.3 KiB
celery-stats_1  |     lambda f: "function (%s)" % f.__name__,
celery-stats_1  | #10: /celery_app/kombu/kombu/transport/virtual/base.py:558: 96.2 KiB
celery-stats_1  |     table.append(meta)
celery-stats_1  | 3109 other: 1796.5 KiB
celery-stats_1  | Total allocated size: 8826.7 KiB

It seems to be caused by this line: https://github.com/celery/kombu/blob/507b3064004133d14b974d387200750f21309323/kombu/transport/virtual/base.py#L558

pawl avatar Dec 26 '21 06:12 pawl

Any suggested fixes?

thedrow avatar Dec 26 '21 19:12 thedrow

It seems like this issue is fixed in 5.2.x ... I no longer observe the problem with 5.2.7.

dejlek avatar Sep 07 '22 10:09 dejlek