celery icon indicating copy to clipboard operation
celery copied to clipboard

active_queues frequently returns None when executed on Celery 5

Open dejlek opened this issue 3 years ago • 12 comments

Checklist

  • [ ] I have verified that the issue exists against the master branch of Celery.
  • [ ] This has already been asked to the discussions forum first.
  • [x] I have read the relevant section in the contribution guide on reporting bugs.
  • [x] I have checked the issues list for similar or identical bug reports.
  • [ ] I have checked the pull requests list for existing proposed fixes.
  • [x] I have checked the commit log to find out if the bug was already fixed in the master branch.
  • [ ] I have included all related issues and possible duplicate issues in this issue (If there are none, check this box anyway).

Mandatory Debugging Information

  • [ ] I have included the output of celery -A proj report in the issue. (if you are not able to do this, then at least specify the Celery version affected).
  • [ ] I have verified that the issue exists against the master branch of Celery.
  • [ ] I have included the contents of pip freeze in the issue.
  • [ ] I have included all the versions of all the external dependencies required to reproduce this bug.

Optional Debugging Information

  • [ ] I have tried reproducing the issue on more than one Python version and/or implementation.
  • [ ] I have tried reproducing the issue on more than one message broker and/or result backend.
  • [ ] I have tried reproducing the issue on more than one version of the message broker and/or result backend.
  • [ ] I have tried reproducing the issue on more than one operating system.
  • [ ] I have tried reproducing the issue on more than one workers pool.
  • [ ] I have tried reproducing the issue with autoscaling, retries, ETA/Countdown & rate limits disabled.
  • [ ] I have tried reproducing the issue after downgrading and/or upgrading Celery and its dependencies.

Related Issues and Possible Duplicates

Related Issues

  • None

Possible Duplicates

  • None

Environment & Settings

Celery version: 5.1.2

celery report Output:

Steps to Reproduce

Required Dependencies

  • Minimal Python Version: N/A or Unknown
  • Minimal Celery Version: N/A or Unknown
  • Minimal Kombu Version: N/A or Unknown
  • Minimal Broker Version: N/A or Unknown
  • Minimal Result Backend Version: N/A or Unknown
  • Minimal OS and/or Kernel Version: N/A or Unknown
  • Minimal Broker Client Version: N/A or Unknown
  • Minimal Result Backend Client Version: N/A or Unknown

Python Packages

pip freeze Output:

Other Dependencies

N/A

Minimally Reproducible Test Case

Expected Behavior

active_queues API call returns valid result (not None) when there are one or more Celery workers running. In my case I always had more than 2 when the exception was thrown.

Actual Behavior

An exception is thrown constantly. I verified that there are workers running by executing status and always more than 2 workers replied with success.:

Traceback (most recent call last):
  File "./jcm.py", line 144, in <module>
    main()
  File "./jcm.py", line 139, in main
    print_stats()
  File "./jcm.py", line 30, in print_stats
    res = gather_stats(insp)
  File "/home/dejan/work/myproj/myproj/util/celery.py", line 52, in gather_stats
    queue_map = _get_queue_map(insp)
  File "/home/dejan/work/myproj/myproj/util/celery.py", line 25, in _get_queue_map
    for node_name in aq.keys():
AttributeError: 'NoneType' object has no attribute 'keys'

We never encountered this with 4.4.7. Recently we had to move to Airflow 2.2.2 and we always use the same Celery version in our Celery cluster as Airflow workers. So we basically had to upgrade to 5.1.2 and we found several different issues (some of them already reported), including this one.

dejlek avatar Jul 24 '22 12:07 dejlek

Hey @dejlek :wave:, Thank you for opening an issue. We will get back to you as soon as we can. Also, check out our Open Collective and consider backing us - every little helps!

We also offer priority support for our sponsors. If you require immediate assistance please consider sponsoring us.

I forgot to mention - 5.2.7 has the same issue. I've tested it in isolated environment to confirm.

dejlek avatar Jul 24 '22 14:07 dejlek

Update - we have the same behaviour with calling active() as well. So every now and then active() or active_queues() return None... This is something that never happened with 4.4.7.

dejlek avatar Jul 26 '22 19:07 dejlek

I can confirm that I observed the same problem with Celery 5.3.0a1.

dejlek avatar Jul 26 '22 21:07 dejlek

Workaround is to increase timeout to, say, 5 seconds. Something like insp = myapp.control.inspect(timeout=5) I still think it is worth having a look why suddenly we have to do this - what is causing Celery 5 to be so slow at replying to inspect commands?

dejlek avatar Jul 27 '22 13:07 dejlek

@dejlek Workaround doesn't work, probably when used with pattern

Script to check if worker is alive

def inspect(queue_name, method):
    app = Celery('ok', broker='amqp://****:{}@{}/****'.format(RABBITMQ_PASSWORD, os.getenv("REMOTE_RABBITMQ") or "localhost"))
    i = app.control.inspect(timeout=10)
    i.pattern = queue_name + "*"
    i.limit = 1
    return getattr(i, method)()
import time
for _ in range(1000):
    n = time.perf_counter()
    ok = len(inspect('my_task', 'ping'))
    print(ok)
    print("Took:", time.perf_counter() - n, "s")
    assert ok > 0
    time.sleep(0.1)
1
Took: 0.019352189963683486 s
1
Took: 0.018780590035021305 s
1
Took: 0.019671502988785505 s
1
Took: 0.03580518299713731 s
1
Took: 0.021641012048348784 s
0
Took: 0.024159759981557727 s

When I use limit=10, it works but it takes a whole second (around 1.05s). Any further ideas on how to handle this?

dunstorm avatar Aug 30 '22 13:08 dunstorm

This also affects me at 5.2.7, when having a busy queue, the None exceptions are quite inconvenient for our health checking. I will try the increased timeouts.

aprams avatar Nov 26 '22 10:11 aprams

Yeah, at this point I just think this project is dead, zero maintenance or support in critical issues. Shame, but maybe a bit more modern approach with asyncio is needed anyway.

PiotrDabkowski avatar Mar 14 '23 03:03 PiotrDabkowski

Yeah, at this point I just think this project is dead, zero maintenance or support in critical issues. Shame, but maybe a bit more modern approach with asyncio is needed anyway.

the project is far from dead and actively maintained. We do not have enough bandwidth to provide free support for all issues. so please either use useful comment or don't create noise. thanks for understanding.

auvipy avatar Mar 14 '23 07:03 auvipy

@yousufctec Popularity has nothing to do with activity. You can't expect a project with 2-3 active developers to fix all the issues, and at the same time do any development and maintenance. Plus, they have their own lives and jobs. If you want something done, either roll-up your sleeve and fix it, or wait for developers to fix it, even if that takes years. Sure, an alternative is to find replacement for Celery - good luck with that. Your attitude will not help anyone, including yourself.

Now let's talk about your issue - inspect() it is not guaranteed to return an object. If the timeout is reached it will return None. This is documented. The simplest solution is to increase the timeout, so change your code to something like:

insp = app.control.inspect(timeout=5.0)
active = insp.active()

Finally, I am not a Celery developer, just a happy user.

dejlek avatar Sep 25 '23 15:09 dejlek

Just wondering how a popular open source library can keep an issue opened for a year+.

just to share some context, celery is built on top of python, which has open bugs/issues older then 20+ years. you can check them here https://github.com/python/cpython/issues?page=278&q=is%3Aissue+is%3Aopen . And celery is way more used then it's maintainer could cope without burnout. but we are trying. time permits and other issues in life beside open source.

auvipy avatar Sep 26 '23 06:09 auvipy

This is happening in version 5.2.7 to us. Wondering if this was fixed in newer Celery versions and this issue was not reported.

Thanks!

monobot avatar Jan 16 '24 12:01 monobot