girder_worker
girder_worker copied to clipboard
Check if Celery Pinning is still necessary
@manthey I seem to recall we pinned celery to celery>=4,<4.2
because some of the things you were working on were having issues with connection pool dropping? Do you think you could see if this is still an issue with 4.2.1?
It looks like celery master has this fixed, but it is listed as targeted for v4.3 (https://github.com/celery/celery/issues/4867).
I wouldn't expect a v4.3 release anytime soon, see https://github.com/celery/celery/issues/4957. In other projects I've just had to pick a sha from master and pin to it.
Well that aged well: https://github.com/celery/celery/releases/tag/v4.3.0rc1.
I've run into this problem again with celery 4.3. It does seem resolved in 4.4.0rc3. I think we should re-pin to celery <4.2 and >= 4.4.
@manthey are there any open issues to indicate other people are seeing the issue with 4.3? I'd like to avoid pinning unless it's clear that 4.4 is going to fix a systemic problem with 4.3 related to connection pool dropping.
The celery issue https://github.com/celery/celery/issues/4867 has several reports of this still being a problem in 4.3.
In working with slicer_cli_web and girder_worker, on celery 4.3.0 I get the problem "often" -- seemingly more than half the time I initiate a second job. On celery 4.4.0rc3, I could not produce the problem.
We shouldn't have unpinned this without verifying that it was fixed -- I only didn't notice because the project I am using girder_worker on was using an older version where the pinned celery version was in place.
Is this an issue that can be resolved downstream at build/deploy time? I would be hesitant to forcibly revert celery versions across all downstream projects.
celery 4.2 and 4.3 are broken using the rabbitmq broker. We recommend rabbitmq in the readthedocs. This would suggest either pinning celery or having a caveat about using rabbitmq in the docs. I don't know what projects are using this successfully. Are they using a different broker?
To be clear, Celery 4.2 and 4.3 - in conjunction with an unknown set of deployment configurations - are showing intermittent errors for some users. Until it is resolved in Celery, it should be mitigated in downstream projects that are experiencing the issue.
Right. Let's pin celery to mitigate the problem.
Projects that somehow work despite the broken version of celery could override this -- our default installation should work rather than be broken.
I don't know that I have followed this completely... if I understand correctly, the problem is that if you pin celery here, you can't override it downstream. When you try to import, python will throw a VersionConflict
exception.
I was reading the celery issue thread but not in super detail... is the issue in celery or kombu?
Its not clear what the issue is, there is no consistently reproducible minimum working example of the error.
https://python-rq.org/
🤷♂
@manthey Is there a practical reason this can't be mitigated downstream in your project code or is your concern more ideological?
Of course it can be mitigated downstream.
I wasted quite a bit of time trying to figure out what was wrong with code when the problem was that girder_worker recommends a broken set of packages -- rabbitmq with current celery. Unless we prevent it (by pinning celery), or advice against using rabbitmq (and have a working example of a different broker), anyone trying to use this project is doomed to sorrow and wasted time. In trying to use the latest release of girder_worker, it works the first time and then fails "randomly".
I'm glad to hear that in the short term this can be mitigated downstream. Unless everyone else is secretly having this problem with Girder Worker and just not speaking up that's the fix until 4.4.0 is released.
Can I get some feedback from @girder/developers as to whether this widely observed behavior?
I haven't seen it, though to be fair I'm not currently administering lots of G_W jobs in the wild.
I'm on board with not trying to fix this here given that we don't actually even know where the problem is.
girder_worker recommends a broken set of packages
Earlier in the thread it was mentioned that this couldn't be minimally reproduced, is that not the case? If such a repro exists I could be persuaded that it's appropriate to try to fix this in g_worker. Barring that, my suggestion would be that we should add a warning to the documentation along the lines of "what to do if you see this error", and recommend downstream pinning.
I'm not using girder-worker, but anecdotally I've been running Celery 4.3.0 with RabbitMQ on multiple projects for several months without running into this issue.
For what it's worth, overly specific pinning has caused my downstream projects a lot of pain in the past since it's more likely to lead to an unresolvable set of packages.
It is useful to know that Celery 4.3 (particularly via girder_worker) is being used successfully with RabbitMQ. I'll make a PR for adding to the documentation to the effect of "if you see this problem, try this".
Great, thanks @manthey
This was resolved quite some time ago.