rabbitmq-c
rabbitmq-c copied to clipboard
amqp_channel_close (0.9.0) hangs sometimes
We use 0.9.0, connecting to 3.7.5 server. We have found that amqp_channel_close
hangs forever sometimes.
The reason we call amqp_channel_close
is that we find sometimes a specific channel is dead, i.e. it does not receive message anymore. We have not figured out why it is dead but we design a recovery algorithm for that like these:
- When we suspect a channel is dead we then send one more "ping" message
- If we can't receive that ping message in a period of time we mark that channel dead. So we will close it and open another channel.
- We call
amqp_channel_close
to close that channel. But unfortunately callingamqp_channel_close
will hang forever sometimes, which makes our recovery algorithm fail to work.
So any suggestion why amqp_channel_close
hangs or how to improve our recovery algorithm ?
Below is some log for hang we experienced
Thread 32 (Thread 0x7fbdd77fe700 (LWP 453)):
#0 0x00007fbdebc2a913 in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00000000004762a7 in amqp_poll (deadline=..., event=<optimized out>, fd=<optimized out>) at /var/jenkins/workspace/Newcc-Modules-Patch-Build-Ubuntu/source/lib/third_party/librabbitmq-0.9.0/librabbitmq/amqp_socket.c:192
#2 recv_with_timeout (state=0x7fbdbc00b310, timeout=...) at /var/jenkins/workspace/Newcc-Modules-Patch-Build-Ubuntu/source/lib/third_party/librabbitmq-0.9.0/librabbitmq/amqp_socket.c:699
#3 0x0000000000476419 in wait_frame_inner (state=0x7fbdbc00b310, decoded_frame=0x7fbdd77fdb20, timeout_deadline=...) at /var/jenkins/workspace/Newcc-Modules-Patch-Build-Ubuntu/source/lib/third_party/librabbitmq-0.9.0/librabbitmq/amqp_socket.c:813
#4 0x000000000047658a in simple_rpc_inner (state=0x7fbdbc00b310, channel=1, request_id=<optimized out>, expected_reply_ids=0x7fbdd77fdc60, decoded_request_method=<optimized out>, deadline=...) at /var/jenkins/workspace/Newcc-Modules-Patch-Build-Ubuntu/source/lib/third_party/librabbitmq-0.9.0/librabbitmq/amqp_socket.c:1055
#5 0x0000000000477b41 in amqp_simple_rpc (state=0x7fbdbc00b310, channel=1, request_id=1310760, expected_reply_ids=0x7fbdd77fdc60, decoded_request_method=0x7fbdd77fdc40) at /var/jenkins/workspace/Newcc-Modules-Patch-Build-Ubuntu/source/lib/third_party/librabbitmq-0.9.0/librabbitmq/amqp_socket.c:1132
#6 0x0000000000474b68 in amqp_channel_close (state=0x7fbdbc00b310, channel=1, code=<optimized out>) at /var/jenkins/workspace/Newcc-Modules-Patch-Build-Ubuntu/source/lib/third_party/librabbitmq-0.9.0/librabbitmq/amqp_api.c:295
#7 0x0000000000429bd3 in mq_close (mq_handler=0x7fbdbc0008c0) at ../src/mq_interface.c:699
#8 0x0000000000429d77 in mq_destroy (mq_handler=0x7fbdbc0008c0) at ../src/mq_interface.c:724
#9 0x000000000040ef18 in ccp_alispe_thread_proc (arg=0x2376f68) at ../src/ccp_core.c:1094
#10 0x000000000045a58a in thread_main (param=0x23777f8) at ../src/pj/os_core_unix.c:541
#11 0x00007fbdec40ce9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
I have seen this too. The same hang inside the poll() call.
I created an abomination using a timer signal in vain attempt to work around it
There is also a (I think separate issue) that this is known to block following an attempt to bind to an exchange that doesn't exist. See https://groups.google.com/forum/#!topic/rabbitmq-c-users/JET2DGQan3g