kombu
kombu copied to clipboard
Task.delay() hang forever when rabbitmq is down
Checklist
- [x] I have verified that the issue exists against the
master
branch of Celery. - [x] This has already been asked to the discussion group first.
- [x] I have read the relevant section in the contribution guide on reporting bugs.
- [x] I have checked the issues list for similar or identical bug reports.
- [x] I have checked the pull requests list for existing proposed fixes.
- [x] I have checked the commit log to find out if the bug was already fixed in the master branch.
- [x] I have included all related issues and possible duplicate issues in this issue (If there are none, check this box anyway).
Mandatory Debugging Information
- [x] I have included the output of
celery -A proj report
in the issue. (if you are not able to do this, then at least specify the Celery version affected). - [x] I have verified that the issue exists against the
master
branch of Celery. - [x] I have included the contents of
pip freeze
in the issue. - [x] I have included all the versions of all the external dependencies required to reproduce this bug.
Optional Debugging Information
- [x] I have tried reproducing the issue on more than one Python version and/or implementation.
- [x] I have tried reproducing the issue on more than one message broker and/or result backend.
- [x] I have tried reproducing the issue on more than one version of the message broker and/or result backend.
- [x] I have tried reproducing the issue on more than one operating system.
- [x] I have tried reproducing the issue on more than one workers pool.
- [x] I have tried reproducing the issue with autoscaling, retries, ETA/Countdown & rate limits disabled.
- [x] I have tried reproducing the issue after downgrading and/or upgrading Celery and its dependencies.
Related Issues and Possible Duplicates
Related Issues
- None
Possible Duplicates
- None
Environment & Settings
Celery version:
celery report
Output:
Steps to Reproduce
Required Dependencies
- Minimal Python Version: 5.1.0
- Minimal Celery Version: 5.1.2
- Minimal Kombu Version: N/A or Unknown
- Minimal Broker Version: N/A or Unknown
- Minimal Result Backend Version: N/A or Unknown
- Minimal OS and/or Kernel Version: N/A or Unknown
- Minimal Broker Client Version: N/A or Unknown
- Minimal Result Backend Client Version: N/A or Unknown
Python Packages
pip freeze
Output:
celery==5.1.2
django-celery-beat==2.2.1
kombu==5.1.0
Other Dependencies
N/A
Minimally Reproducible Test Case
Start rabbitmq, then execute task.delay() => ok
Stop rabbitmq, then execute task.delay() again => hang 5 mins, then raise exception
kombu.exceptions.OperationalError: failed to resolve broker hostname
I try with config: CELERY_BROKER_TRANSPORT_OPTIONS = {"max_retries": 3, "interval_start": 0, "interval_step": 0.2, "interval_max": 0.5} but it does not work.
Expected Behavior
Raise exception in some seconds
Actual Behavior
Hang for several minutes.
Hey @thanhpd-teko :wave:, Thank you for opening an issue. We will get back to you as soon as we can. Also, check out our Open Collective and consider backing us - every little helps!
We also offer priority support for our sponsors. If you require immediate assistance please consider sponsoring us.
By default, we retry connecting to the broker 100 times before giving up. See broker_connection_max_retries
in the documentation.
I'm aware this is unusually high for the producer side and this is definitely a design flaw.
However, on the consumer side, we don't want to quit until we're certain the broker is down and not going to recover.
I can look into introducing a new configuration setting but in the meanwhile, you should set the broker_connection_max_retries
to a lower value on the producer side.
Hi @thedrow , Thank for your respond. I try to reduce max retries but it still does hang really longtime. It's not working. :(
app.conf.broker_connection_timeout = 1
app.conf.broker_connection_max_retries = 1
@thedrow, one more debug information. If I change broker to redis with these options:
app.conf.broker_transport_options = {
'max_retries': 1,
'interval_start': 0,
'interval_step': 0.2,
'interval_max': 0.2,
}
app.conf.broker_url = 'redis://broker:6379'
app.conf.broker_connection_timeout = 1
app.conf.broker_connection_max_retries = 1
It will quit very shortime (1-2 seconds).
But if the broker is rabbitmq and the same configurations, it will hang very longtime. Is there any difference between 2 kinds of broker?
This is a bug with our implementation. I'd need to try this myself to reproduce the bug.
This bug is more about kombu rather than Celery. Kombu has two semantics when establising connection:
- directly failing:
>>> import kombu
>>> con = kombu.Connection('amqp://')
>>> con.connect() # This call immediately raises exception
Traceback (most recent call last):
File "/home/matus/dev/kombu39/lib/python3.9/site-packages/amqp/transport.py", line 172, in _connect
entries = socket.getaddrinfo(
File "/usr/lib/python3.9/socket.py", line 953, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -9] Address family for hostname not supported
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/matus/dev/kombu/kombu/connection.py", line 275, in connect
return self._ensure_connection(
...
File "/home/matus/dev/kombu39/lib/python3.9/site-packages/amqp/transport.py", line 197, in _connect
self.sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
- blocking until broker is back again:
>>> import kombu
>>> con = kombu.Connection('amqp://')
>>> con.default_channel. # Blocks until broker is not available
To control this behaviour transport_options
parameter of Connection
constructor needs to be used. The example you provided with transport_options
should work at least it works for me:
>>> import kombu
>>> con = kombu.Connection('amqp://', transport_options = {'max_retries': 1,'interval_start': 0,'interval_step': 0.2,'interval_max': 0.2})
>>> con.default_channel # Raises immediately
Traceback (most recent call last):
File "/home/matus/dev/kombu39/lib/python3.9/site-packages/amqp/transport.py", line 172, in _connect
entries = socket.getaddrinfo(
File "/usr/lib/python3.9/socket.py", line 953, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -9] Address family for hostname not supported
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/matus/dev/kombu/kombu/connection.py", line 447, in _reraise_as_library_errors
yield
...
ConnectionRefusedError: [Errno 111] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
...
kombu.exceptions.OperationalError: [Errno 111] Connection refused
The possible cause why it takes so long for you is that you experienced some kind of network error which causes longer time for client to detect that connection is broken. The transport options just controls the way how kombu is retrying in case of network error but still network can cause that error can be raised after longer time (some timeout in network protocol) and this cannot be fixed by kombu library.
Hi @matusvalo, should I start celery task within thread, so that it won't stop the main thread? Is there any issue?
thread = Thread(target=sum.delay)
thread.start()
Technically you can offload this to different thread but you need also to understand all details what does it mean - e.g.
- you need to handle blocked thread - there is no easy way to kill blocked thread.
- you need to manage/set daemon thread not to block main process during termination of main process etc.
As mentioned before if you don't like blocking behaviour you can set retries and timeouts accordingly.
Hey @thanhpd-teko :wave:, Thank you for opening an issue. We will get back to you as soon as we can. Also, check out our Open Collective and consider backing us - every little helps!
We also offer priority support for our sponsors. If you require immediate assistance please consider sponsoring us.
Kombu exception is raised with django signals and pytest I need to know a possible fix for the errors so I can stop running iinto it