celery icon indicating copy to clipboard operation
celery copied to clipboard

Inspect.active is returning finished tasks

Open iamtrk opened this issue 1 year ago • 6 comments

Checklist

  • [X] I have verified that the issue exists against the main branch of Celery.
  • [ ] This has already been asked to the discussions forum first.
  • [ ] I have read the relevant section in the contribution guide on reporting bugs.
  • [ ] I have checked the issues list for similar or identical bug reports.
  • [ ] I have checked the pull requests list for existing proposed fixes.
  • [ ] I have checked the commit log to find out if the bug was already fixed in the main branch.
  • [ ] I have included all related issues and possible duplicate issues in this issue (If there are none, check this box anyway).
  • [ ] I have tried to reproduce the issue with pytest-celery and added the reproduction script below.

Mandatory Debugging Information

  • [X] I have included the output of celery -A proj report in the issue. (if you are not able to do this, then at least specify the Celery version affected).
  • [ ] I have verified that the issue exists against the main branch of Celery.
  • [ ] I have included the contents of pip freeze in the issue.
  • [ ] I have included all the versions of all the external dependencies required to reproduce this bug.

Optional Debugging Information

  • [ ] I have tried reproducing the issue on more than one Python version and/or implementation.
  • [ ] I have tried reproducing the issue on more than one message broker and/or result backend.
  • [ ] I have tried reproducing the issue on more than one version of the message broker and/or result backend.
  • [ ] I have tried reproducing the issue on more than one operating system.
  • [ ] I have tried reproducing the issue on more than one workers pool.
  • [ ] I have tried reproducing the issue with autoscaling, retries, ETA/Countdown & rate limits disabled.
  • [ ] I have tried reproducing the issue after downgrading and/or upgrading Celery and its dependencies.

Related Issues and Possible Duplicates

Related Issues

  • None

Possible Duplicates

  • None

Environment & Settings

Celery version:

celery report Output:

software -> celery:5.4.0 (opalescent) kombu:5.4.0 py:3.9.16
            billiard:4.2.0 redis:5.0.8
platform -> system:Linux arch:64bit
            kernel version:6.1.30-4 imp:CPython
loader   -> celery.loaders.app.AppLoader
settings -> transport:redis results:<REDIS>

autoload: True
name: '<>.executors.celery_executor'
BROKER_URL: 'REDIS_URL'
CELERY_RESULT_BACKEND: 'BACKEND'
CELERY_DISABLE_RATE_LIMITS: True
CELERY_IGNORE_RESULT: False
CELERYD_PREFETCH_MULTIPLIER: 1
CELERY_ACKS_LATE: True
CELERYD_CONCURRENCY: 8
CELERY_SEND_TASK_ERROR_EMAILS: True
CELERY_ACCEPT_CONTENT: ['pickle', 'json']
CELERY_RESULT_SERIALIZER: 'json'
CELERY_TASK_SERIALIZER: 'json'
visibility_cooldown: 43300
visibility_timeout: 43500
flower_port: 8383
CELERY_IMPORTS: ['IMPORTS']
CELERY_DEFAULT_QUEUE: '<QUEUE>'
CELERY_DEFAULT_EXCHANGE: '<EXCHANGE>'
CELERY_ROUTES: None
<>.executors.celery_executor.execute_command: 
 'queue': '<QUEUE>'}
CELERY_SECURE_QUEUE: '<QUEUE>'
deprecated_settings: None

Steps to Reproduce

Required Dependencies

  • Minimal Python Version: N/A or Unknown
  • Minimal Celery Version: N/A or Unknown
  • Minimal Kombu Version: N/A or Unknown
  • Minimal Broker Version: N/A or Unknown
  • Minimal Result Backend Version: N/A or Unknown
  • Minimal OS and/or Kernel Version: N/A or Unknown
  • Minimal Broker Client Version: N/A or Unknown
  • Minimal Result Backend Client Version: N/A or Unknown

Python Packages

pip freeze Output:

Other Dependencies

N/A

Minimally Reproducible Test Case

Expected Behavior

/usr/local/bin/celery --app=APP_ID inspect active should now show finished tasks as active running tasks

Actual Behavior

/usr/local/bin/celery --app=APP_ID inspect active is showing already finished tasks.

iamtrk avatar Sep 27 '24 17:09 iamtrk

Did anyone face this problem ? I am trying to monitor the load on Celery cluster, load = No.of active workers / No. of hosts in the cluster X Celery workers per node. But inspect active is returning lots of finished tasks as well.

iamtrk avatar Sep 27 '24 17:09 iamtrk

If you are broadcasting inspect instead of directed inspection (where you inspect particular node only) it will wait for all nodes to give back result, or timeout. By the time you get results, many tasks may have finished with their work...

What I have done is the following: I've implemented "node_stats" command that I either broadcast or direct at particular node.

Example:

shell>> celery -A my.app inspect node_stats -d celery1@i-babadeda4db695263
->  celery1@i-babadeda4db695263: OK
    {
        "cpu_count": 4,
        "disk/": 22.4,
        "load1": 1.21,
        "load15": 1.53,
        "load5": 1.35,
        "mem": 13.1,
        "swap": 0.0
    }

There was a bug in Celery that got fixed recently, so for this to work you need Celery 5.4.0, or old Celery that does not use Click (can't remember which version that was).

dejlek avatar Sep 30 '24 09:09 dejlek

Hi @dejlek thanks for the reply,

Its not about latency, the command is returning in few seconds. The inspect.active() returns jobs that are finished 2/3 days before. We have recently migrated from Celery 3.1.23 on Py3.6 to Celery 5.4.0 on Py3.9. inspect.active() used to work as expected on Celery 3.1.23.

We are seeing this bug in Celery 5.4.0

iamtrk avatar Sep 30 '24 16:09 iamtrk

Hi @dejlek thanks for the reply,

Its not about latency, the command is returning in few seconds. The inspect.active() returns jobs that are finished 2/3 days before. We have recently migrated from Celery 3.1.23 on Py3.6 to Celery 5.4.0 on Py3.9. inspect.active() used to work as expected on Celery 3.1.23.

We are seeing this bug in Celery 5.4.0

We're working on v5.5 at the moment, currently in beta pre-release. May I ask you to check if it still reproduces with the latest version?

Nusnus avatar Sep 30 '24 16:09 Nusnus

@Nusnus you probably meant to send that message to @iamtrk ??

dejlek avatar Oct 01 '24 18:10 dejlek

@Nusnus you probably meant to send that message to @iamtrk ??

Yes, correct. Apologies on the wrong tagging 🙏 Thanks.

Nusnus avatar Oct 02 '24 03:10 Nusnus

Did anyone face this problem ? I am trying to monitor the load on Celery cluster, load = No.of active workers / No. of hosts in the cluster X Celery workers per node. But inspect active is returning lots of finished tasks as well.

I encountered a similar issue recently and also noticed that the start of completed tasks from the inspect active output almost coincides with a Connection error in Redis/HAProxy, which occurs approximately once an hour.

dzhamaldev avatar Oct 21 '24 12:10 dzhamaldev