activemq-artemis-helm icon indicating copy to clipboard operation
activemq-artemis-helm copied to clipboard

Timeout while handshaking was occured

Open Unlocker opened this issue 6 years ago • 6 comments

I have installed a chart with 2 replicas set with a static cluster configuration. First I fixed domain naming issue so a bridge been estabilished.

Bridge ClusterConnectionBridge@3c966cfd [name=$.artemis.internal.sf.habroker-jms.477161f3-fdde-11e8-999f-0242c0022412, queue=QueueImpl[name=$.artemis.internal.sf.habroker-jms.477161f3-fdde-11e8-999f-0242c0022412, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=3504f3fc-fd53-11e8-9c53-0242c0022412], temp=false]@7994e50b targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@3c966cfd [name=$.artemis.internal.sf.habroker-jms.477161f3-fdde-11e8-999f-0242c0022412, queue=QueueImpl[name=$.artemis.internal.sf.habroker-jms.477161f3-fdde-11e8-999f-0242c0022412, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=3504f3fc-fd53-11e8-9c53-0242c0022412], temp=false]@7994e50b targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=habroker-jms-master-1, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=habroker-jms-master-1-habroker-jms-master-pmh-depl-svc-kube-local], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@1788364006[nodeUUID=3504f3fc-fd53-11e8-9c53-0242c0022412, connector=TransportConfiguration(name=habroker-jms-slave-1, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=habroker-jms-slave-1-habroker-jms-slave-pmh-depl-svc-kube-local, address=jms, server=ActiveMQServerImpl::serverUUID=3504f3fc-fd53-11e8-9c53-0242c0022412])) [initialConnectors=[TransportConfiguration(name=habroker-jms-master-1, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=habroker-jms-master-1-habroker-jms-master-pmh-depl-svc-kube-local], discoveryGroupConfiguration=null]] is connected

But logs from master nodes contains errors

2018-12-12 07:51:53,968 ERROR [org.apache.activemq.artemis.core.server] AMQ224088: Timeout (10 seconds) while handshaking has occurred.

  • Is any way to debug and shoot this trouble?
  • What broker settings are responsible for the handshaking process?

Unlocker avatar Dec 12 '18 07:12 Unlocker

Hi @Unlocker. Actually that message is benign. Its Kubernetes Readiness Probe hitting the JMS transport connection every 10 seconds to test that the connection is up. It's annoying to show as an error, but there doesn't seem to be a better way to test readiness at the moment.

DanSalt avatar Dec 12 '18 20:12 DanSalt

@DanSalt, thank you for the answer. When I am running two or greater master nodes then I am expecting message and addresses replication is done. I have connected to one node and send a message, another node has nothing to receive and topic is not created. I think this behavior relates a handshake TCP connection. Maybe it is my wrong opinion, isn't it?

Unlocker avatar Dec 12 '18 20:12 Unlocker

Hi @Unlocker. The cluster configuration is setup to include all masters (and slaves) as part of the known cluster. You can check this inside the ActiveMQ console by checking that you can see both masters in the cluster map.

However, by default, Message Replication is configured to work on demand, which means that it will only replicate the message if there is a consumer connected to the other broker for that destination.

Hope this helps.

DanSalt avatar Dec 13 '18 20:12 DanSalt

Hi @DanSalt. Your comments are very useful. I have read the "Clusters" chapter in the documentation. To make a symmetric distribution under a load balancer (Cluster IP service) I need:

  1. Set up all the queues and topics creation in broker.xml distributed with all Pods.
  2. Turn "message-load-balancing" in STRICT mode

I am expect that persistent messages can deliver any live broker instance regardless where are publisher and subscribers connected to. If it is right then issue can be closed.

Unlocker avatar Dec 14 '18 20:12 Unlocker

Hi @Unlocker . Glad it was useful.

To scale horizontally, I use the defaults as given in the charts here. I generally find that I do not need to pre-define any destinations in broker.xml, unless I have specific configuration I need to apply to them (QoS, etc.). Artemis (like AMQ) will auto-create destinations on-the-fly.

I generally stick to 'ON_DEMAND' for best performance. STRICT mode means that it will always distribute the messages, even if the other brokers don't have any consumers. Bear in mind that Artemis won't distribute messages for queues that have no consumers, even on STRICT mode.

Glad I could help, and good luck!

DanSalt avatar Dec 15 '18 01:12 DanSalt

Hi @Unlocker. Actually that message is benign. Its Kubernetes Readiness Probe hitting the JMS transport connection every 10 seconds to test that the connection is up. It's annoying to show as an error, but there doesn't seem to be a better way to test readiness at the moment.

@DanSalt

Is there anything we can do about this? We're exporting stdout to kibana and it's a lot of logs to be shipping off for a readiness check.

Is there an actual health check on artemis that we can hit?

Could we do something like this: https://github.com/apache/activemq-artemis/blob/master/docs/user-manual/en/rest.md

tomhobson avatar Nov 13 '19 15:11 tomhobson