helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

Error: No nodes replied within time constraint

Open rrrnld opened this issue 5 months ago • 1 comments

We're self-hosting saleor and running into issues with our celery deployment, where the worker appears to get stuck after a while. We're deploying to k8s and run celery workers like this:

celery -A saleor --app=saleor.celeryconf:app worker --loglevel=info --beat

This is taken from the config that was removed here: https://github.com/saleor/saleor/pull/13777

I can see the worker processes are running. It's also what this repo uses to deploy saleor: https://github.com/trieb-work/helm-charts/blob/fbe6ce6748c449f4a8889fa653063cafad3a4303/charts/saleor/templates/celery_deployment.yaml#L26-L52

Is this the correct way to? I'm asking because celery -A saleor --app=saleor.celeryconf:app is redundant for example. Also, shelling into the container and trying to inspect it via celery -A saleor --app=saleor.celeryconf:app inspect active or celery -A saleor --app=saleor.celeryconf:app status both fail, and the lifetime check here in this repo does not seem to be working at all.

Error: No nodes replied within time constraint

Any idea what might be wrong with our healthchecks / lifetime checks?

rrrnld avatar Sep 04 '24 06:09 rrrnld