celery-director icon indicating copy to clipboard operation
celery-director copied to clipboard

Bugfix: ignore internal celery tasks in director_prerun

Open CodePint opened this issue 3 years ago • 10 comments
trafficstars

Celery chord callbacks work differently with redis vs postgres/rabbitmq result backends and require the use of polling with an internal celery task. This results in an unhandled exception being thrown in the the director_prerun handler. ref: http://blog.untrod.com/2015/03/how-celery-chord-synchronization-works.html

  • Added a check in director_prerun to ignore internal celery.* tasks
  • Improved exception handling for tasks which are not found by creating a TaskNotFound exception and a Task#get_or_raise model function.

Signed-off-by: George Eddie [email protected]

CodePint avatar Mar 08 '22 14:03 CodePint

Tests failing, blocked by PR: https://github.com/ovh/celery-director/pull/128

CodePint avatar Mar 10 '22 12:03 CodePint

Tests should now pass, rebased off of master and force pushed

CodePint avatar Mar 10 '22 20:03 CodePint

Hi @CodePint ,

We tried to reproduce the bug you talk about without success. For instance the following workflow representing a chord has been launched using RabbitMQ and everything worked as expected:

image

Can you please give us an example to reproduce it locally? We'll be able to validate your PR.

Nicolas

ncrocfer avatar Mar 11 '22 17:03 ncrocfer

Hello, thanks for looking at this.

Whilst the workflow and tasks succeeded, the chord callback would fail in the prerun handler for the celery.chord_unlock task

You should see something like AttributeError: 'NoneType' object has no attribute 'status' in the worker logs. Are you okay to try again and see if you can replicate? If not i'll see if I can put a MRP together.

Screenshot from 2022-03-11 17-53-05

CodePint avatar Mar 11 '22 17:03 CodePint

Please give us a real & concrete example for us to test (so a workflow.yml and tasks/foo.py files, because as I said: no I wasn't able to reproduce the problem.

Because indeed we have this kind of problem when using vanilla Celery canvas, but it's not related to RabbitMQ (we have the same results with Redis).

So please again, give an example.

ncrocfer avatar Mar 11 '22 18:03 ncrocfer

Please give us a real & concrete example for us to test (so a workflow.yml and tasks/foo.py files, because as I said: no I wasn't able to reproduce the problem.

Because indeed we have this kind of problem when using vanilla Celery canvas, but it's not related to RabbitMQ (we have the same results with Redis).

So please again, give an example.

Okay, not a problem. ill put something together either over the weekend or monday. What I would say is, regardless of this specific bug; Internal celery tasks should be ignored in the prerun handler

CodePint avatar Mar 11 '22 18:03 CodePint

To be honest Celery Director was not developed to launch vanilla Celery tasks, its main objective was to provide an easy framework to execute YAML workflow.

We know vanilla tasks fail, but it's not a problem because tasks in Director have to be described in YAML.

ncrocfer avatar Mar 11 '22 18:03 ncrocfer

To be honest Celery Director was not developed to launch vanilla Celery tasks, its main objective was to provide an easy framework to execute YAML workflow.

We know vanilla tasks fail, but it's not a problem because tasks in Director have to be described in YAML.

I think there might be a bit of confusion on this area, i'll get you a reproducible example and some more docs. Hopefully that should clear things up, thanks again for your time.

CodePint avatar Mar 11 '22 18:03 CodePint

@CodePint do you have reproducible example and some more docs to give us?

ncrocfer avatar Apr 13 '22 17:04 ncrocfer

Did you have time to provide us a reproducible example please? If not I will close this PR, we use a lot of chord in our code and we don't have this problem ;)

ncrocfer avatar Jul 15 '22 15:07 ncrocfer

Closing because of inactivity.

ncrocfer avatar Dec 08 '22 13:12 ncrocfer