celery
                                
                                 celery copied to clipboard
                                
                                    celery copied to clipboard
                            
                            
                            
                        active_queues frequently returns None when executed on Celery 5
Checklist
- [ ] I have verified that the issue exists against the masterbranch of Celery.
- [ ] This has already been asked to the discussions forum first.
- [x] I have read the relevant section in the contribution guide on reporting bugs.
- [x] I have checked the issues list for similar or identical bug reports.
- [ ] I have checked the pull requests list for existing proposed fixes.
- [x] I have checked the commit log to find out if the bug was already fixed in the master branch.
- [ ] I have included all related issues and possible duplicate issues in this issue (If there are none, check this box anyway).
Mandatory Debugging Information
- [ ] I have included the output of celery -A proj reportin the issue. (if you are not able to do this, then at least specify the Celery version affected).
- [ ] I have verified that the issue exists against the masterbranch of Celery.
- [ ] I have included the contents of pip freezein the issue.
- [ ] I have included all the versions of all the external dependencies required to reproduce this bug.
Optional Debugging Information
- [ ] I have tried reproducing the issue on more than one Python version and/or implementation.
- [ ] I have tried reproducing the issue on more than one message broker and/or result backend.
- [ ] I have tried reproducing the issue on more than one version of the message broker and/or result backend.
- [ ] I have tried reproducing the issue on more than one operating system.
- [ ] I have tried reproducing the issue on more than one workers pool.
- [ ] I have tried reproducing the issue with autoscaling, retries, ETA/Countdown & rate limits disabled.
- [ ] I have tried reproducing the issue after downgrading and/or upgrading Celery and its dependencies.
Related Issues and Possible Duplicates
Related Issues
- None
Possible Duplicates
- None
Environment & Settings
Celery version: 5.1.2
celery report Output:
Steps to Reproduce
Required Dependencies
- Minimal Python Version: N/A or Unknown
- Minimal Celery Version: N/A or Unknown
- Minimal Kombu Version: N/A or Unknown
- Minimal Broker Version: N/A or Unknown
- Minimal Result Backend Version: N/A or Unknown
- Minimal OS and/or Kernel Version: N/A or Unknown
- Minimal Broker Client Version: N/A or Unknown
- Minimal Result Backend Client Version: N/A or Unknown
Python Packages
pip freeze Output:
Other Dependencies
N/A
Minimally Reproducible Test Case
Expected Behavior
active_queues API call returns valid result (not None) when there are one or more Celery workers running. In my case I always had more than 2 when the exception was thrown.
Actual Behavior
An exception is thrown constantly. I verified that there are workers running by executing status and always more than 2 workers replied with success.:
Traceback (most recent call last):
  File "./jcm.py", line 144, in <module>
    main()
  File "./jcm.py", line 139, in main
    print_stats()
  File "./jcm.py", line 30, in print_stats
    res = gather_stats(insp)
  File "/home/dejan/work/myproj/myproj/util/celery.py", line 52, in gather_stats
    queue_map = _get_queue_map(insp)
  File "/home/dejan/work/myproj/myproj/util/celery.py", line 25, in _get_queue_map
    for node_name in aq.keys():
AttributeError: 'NoneType' object has no attribute 'keys'
We never encountered this with 4.4.7. Recently we had to move to Airflow 2.2.2 and we always use the same Celery version in our Celery cluster as Airflow workers. So we basically had to upgrade to 5.1.2 and we found several different issues (some of them already reported), including this one.
Hey @dejlek :wave:, Thank you for opening an issue. We will get back to you as soon as we can. Also, check out our Open Collective and consider backing us - every little helps!
We also offer priority support for our sponsors. If you require immediate assistance please consider sponsoring us.
I forgot to mention - 5.2.7 has the same issue. I've tested it in isolated environment to confirm.
Update - we have the same behaviour with calling active() as well. So every now and then active() or active_queues() return None... This is something that never happened with 4.4.7.
I can confirm that I observed the same problem with Celery 5.3.0a1.
Workaround is to increase timeout to, say, 5 seconds. Something like insp = myapp.control.inspect(timeout=5)
I still think it is worth having a look why suddenly we have to do this - what is causing Celery 5 to be so slow at replying to inspect commands?
@dejlek Workaround doesn't work, probably when used with pattern
Script to check if worker is alive
def inspect(queue_name, method):
    app = Celery('ok', broker='amqp://****:{}@{}/****'.format(RABBITMQ_PASSWORD, os.getenv("REMOTE_RABBITMQ") or "localhost"))
    i = app.control.inspect(timeout=10)
    i.pattern = queue_name + "*"
    i.limit = 1
    return getattr(i, method)()
import time
for _ in range(1000):
    n = time.perf_counter()
    ok = len(inspect('my_task', 'ping'))
    print(ok)
    print("Took:", time.perf_counter() - n, "s")
    assert ok > 0
    time.sleep(0.1)
1
Took: 0.019352189963683486 s
1
Took: 0.018780590035021305 s
1
Took: 0.019671502988785505 s
1
Took: 0.03580518299713731 s
1
Took: 0.021641012048348784 s
0
Took: 0.024159759981557727 s
When I use limit=10, it works but it takes a whole second (around 1.05s).
Any further ideas on how to handle this?
This also affects me at 5.2.7, when having a busy queue, the None exceptions are quite inconvenient for our health checking. I will try the increased timeouts.
Yeah, at this point I just think this project is dead, zero maintenance or support in critical issues. Shame, but maybe a bit more modern approach with asyncio is needed anyway.
Yeah, at this point I just think this project is dead, zero maintenance or support in critical issues. Shame, but maybe a bit more modern approach with asyncio is needed anyway.
the project is far from dead and actively maintained. We do not have enough bandwidth to provide free support for all issues. so please either use useful comment or don't create noise. thanks for understanding.
@yousufctec Popularity has nothing to do with activity. You can't expect a project with 2-3 active developers to fix all the issues, and at the same time do any development and maintenance. Plus, they have their own lives and jobs. If you want something done, either roll-up your sleeve and fix it, or wait for developers to fix it, even if that takes years. Sure, an alternative is to find replacement for Celery - good luck with that. Your attitude will not help anyone, including yourself.
Now let's talk about your issue - inspect() it is not guaranteed to return an object. If the timeout is reached it will return None. This is documented. The simplest solution is to increase the timeout, so change your code to something like:
insp = app.control.inspect(timeout=5.0)
active = insp.active()
Finally, I am not a Celery developer, just a happy user.
Just wondering how a popular open source library can keep an issue opened for a year+.
just to share some context, celery is built on top of python, which has open bugs/issues older then 20+ years. you can check them here https://github.com/python/cpython/issues?page=278&q=is%3Aissue+is%3Aopen . And celery is way more used then it's maintainer could cope without burnout. but we are trying. time permits and other issues in life beside open source.
This is happening in version 5.2.7 to us. Wondering if this was fixed in newer Celery versions and this issue was not reported.
Thanks!