stomp.py icon indicating copy to clipboard operation
stomp.py copied to clipboard

Reconnect Never Uses Failover?

Open micah-williamson opened this issue 3 years ago • 6 comments

I am using this client to interface with an AmazonMQ (ActiveMQ) active/standby setup. I have provided the connection host_and_ports for both the active and standby connections.

self.connection = Connection(host_and_ports=hosts_and_ports, use_ssl=True, reconnect_attempts_max=1000)
self.connection.connect(username, password, wait=True)

To test the failover, I put the connection in an infinite loop where I send a message to the queue, then read from the queue, with a 1s sleep in between.

While that's running, I reboot the AmazonMQ active/standby, which first reboots the active, then the standby, sequentially so at any given point at least 1 ActiveMQ instance is available.

My expectation is that this soon after the active instance disconnects, the connection will swap over to the standby. At the very least I would expect (with such a high reconnect_attempts_max) that the connection would be reestablished once the active comes back online. But this is not the case, the connection will not fail over and never come back once it is disconnected.

Am I using this wrong?

Do I need to manually reconnect in the on_disconnect of the ConnectionListener? If so, what is the purpose of providing multiple connections in the host_and_ports in the first place?

micah-williamson avatar Jun 08 '21 20:06 micah-williamson

@micah-williamson - Did you ever come up with the solution to this? Seems like I'm walking in your footsteps right now and came to the same dead end. 😞

For additional color, configuring the connection for n hosts_and_ports (where n is greater than 1) is not enough to configure failover. On initial connect it round robins through the connections until it finds your active broker but will not auto re-connect on failover. I wrote my on_disconnected method to retry the same initial connection and it would give up after two attempts. Adding the following to Connection was...better...:

reconnect_sleep_initial=5, reconnect_sleep_increase=0.5, reconnect_sleep_jitter=0.1, reconnect_sleep_max=120.0, reconnect_attempts_max=10

But still not great. During an AMQ reboot I would expect it to disconnect and reconnect twice (with minimal downtime) to wind up back on the initial broker, but it kept dropping and reconnecting even with the reconnect better configured. It did eventually reconnect to the initial broker though. In a couple of tests, however, it would still fail to reconnect and would eventually throw an exception.

Code below for reference if it helps or shows someone else where I've gone wrong. 😂


import time
import stomp

def connect_and_subscribe(conn):
    conn.connect('test-user', 'test-password', wait=True, reconnect_attempts_max=24)
    conn.subscribe(destination='/queue/test', id=1, ack='auto')

class MyListener(stomp.ConnectionListener):
    def __init__(self, conn):
        self.conn = conn

    def on_error(self, frame):
        print('received an error "%s"' % frame.body)

    def on_message(self, frame):
        print('Message: ' + frame.body)

    def on_connected(self, frame):
        print('Connected...')

    def on_connecting(self, host_and_port):
        print('Connecting to: ' + host_and_port[0] + '...')

    def on_disconnected(self):
        print('Disconnected...')
        connect_and_subscribe(self.conn)

conn = stomp.Connection([('ENDPOINT-1.mq.us-east-2.amazonaws.com', 61614), ('ENDPOINT-2.mq.us-east-2.amazonaws.com', 61614)], use_ssl=True, heartbeats=(4000, 4000), reconnect_sleep_initial=5, reconnect_sleep_increase=0.5, reconnect_sleep_jitter=0.1, reconnect_sleep_max=120.0, reconnect_attempts_max=10)
conn.set_listener('', MyListener(conn))

connect_and_subscribe(conn)
time.sleep(600)

print('Disconnecting...')
conn.disconnect()

JohnKeippel avatar Jan 10 '22 19:01 JohnKeippel

@JohnKeippel Checked my implementation and it doesn't look like I came up with anything. We abandoned MQ shortly after this after getting on a call with AWS and finding there is a hard limit of 5 lambda workers per MQ ESM. AWS treats MQ as no-more than a checklist feature. Wish I could offer something actually useful.

micah-williamson avatar Jan 10 '22 19:01 micah-williamson

Hey, no problem at all. Thanks for the quick reply! Not excited about supporting it either but it is what it is.

AWS treats MQ as no-more than a checklist feature.

It really could not be more obvious.

JohnKeippel avatar Jan 10 '22 22:01 JohnKeippel

JohnKeippel Is there any new message ? I have the same problem, i start a failover ActiveMQ at localhost, but the client does not work. conn = stomp.Connection([('localhost',61613), ('localhost',61614)], reconnect_sleep_initial=5, reconnect_sleep_increase=0.5, reconnect_sleep_jitter=0.1, reconnect_sleep_max=120.0, reconnect_attempts_max=10)

cocowool avatar Sep 08 '22 00:09 cocowool

@JohnKeippel @cocowool @micah-williamson Were you able to figure out a solution ? Since I am facing the issue. I have a broker network and failover configured on Amazon MQ (Active MQ) in the XML, but the producer/subscriber do not seem to reconnect to another broker when a broker restarts. Also could you guys were able to figure out how to make the producer retry / change the broker if a broker connection fails ?

adityashanbhog avatar Feb 28 '23 10:02 adityashanbhog

@cocowool @a-n-s - Sorry, I didn't have any luck and then was laid off from that role and instantly lost interest in AMQ. 😂

For what it's worth the plan was to eventually retire AMQ entirely for this and other reasons. Good luck!

JohnKeippel avatar Mar 01 '23 13:03 JohnKeippel