activemq-artemis-helm
activemq-artemis-helm copied to clipboard
Timeout while handshaking was occured
I have installed a chart with 2 replicas set with a static cluster configuration. First I fixed domain naming issue so a bridge been estabilished.
Bridge ClusterConnectionBridge@3c966cfd [name=$.artemis.internal.sf.habroker-jms.477161f3-fdde-11e8-999f-0242c0022412, queue=QueueImpl[name=$.artemis.internal.sf.habroker-jms.477161f3-fdde-11e8-999f-0242c0022412, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=3504f3fc-fd53-11e8-9c53-0242c0022412], temp=false]@7994e50b targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@3c966cfd [name=$.artemis.internal.sf.habroker-jms.477161f3-fdde-11e8-999f-0242c0022412, queue=QueueImpl[name=$.artemis.internal.sf.habroker-jms.477161f3-fdde-11e8-999f-0242c0022412, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=3504f3fc-fd53-11e8-9c53-0242c0022412], temp=false]@7994e50b targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=habroker-jms-master-1, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=habroker-jms-master-1-habroker-jms-master-pmh-depl-svc-kube-local], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@1788364006[nodeUUID=3504f3fc-fd53-11e8-9c53-0242c0022412, connector=TransportConfiguration(name=habroker-jms-slave-1, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=habroker-jms-slave-1-habroker-jms-slave-pmh-depl-svc-kube-local, address=jms, server=ActiveMQServerImpl::serverUUID=3504f3fc-fd53-11e8-9c53-0242c0022412])) [initialConnectors=[TransportConfiguration(name=habroker-jms-master-1, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=habroker-jms-master-1-habroker-jms-master-pmh-depl-svc-kube-local], discoveryGroupConfiguration=null]] is connected
But logs from master nodes contains errors
2018-12-12 07:51:53,968 ERROR [org.apache.activemq.artemis.core.server] AMQ224088: Timeout (10 seconds) while handshaking has occurred.
- Is any way to debug and shoot this trouble?
- What broker settings are responsible for the handshaking process?
Hi @Unlocker. Actually that message is benign. Its Kubernetes Readiness Probe hitting the JMS transport connection every 10 seconds to test that the connection is up. It's annoying to show as an error, but there doesn't seem to be a better way to test readiness at the moment.
@DanSalt, thank you for the answer. When I am running two or greater master nodes then I am expecting message and addresses replication is done. I have connected to one node and send a message, another node has nothing to receive and topic is not created. I think this behavior relates a handshake TCP connection. Maybe it is my wrong opinion, isn't it?
Hi @Unlocker. The cluster configuration is setup to include all masters (and slaves) as part of the known cluster. You can check this inside the ActiveMQ console by checking that you can see both masters in the cluster map.
However, by default, Message Replication is configured to work on demand, which means that it will only replicate the message if there is a consumer connected to the other broker for that destination.
Hope this helps.
Hi @DanSalt. Your comments are very useful. I have read the "Clusters" chapter in the documentation. To make a symmetric distribution under a load balancer (Cluster IP service) I need:
- Set up all the queues and topics creation in broker.xml distributed with all Pods.
- Turn "message-load-balancing" in STRICT mode
I am expect that persistent messages can deliver any live broker instance regardless where are publisher and subscribers connected to. If it is right then issue can be closed.
Hi @Unlocker . Glad it was useful.
To scale horizontally, I use the defaults as given in the charts here. I generally find that I do not need to pre-define any destinations in broker.xml, unless I have specific configuration I need to apply to them (QoS, etc.). Artemis (like AMQ) will auto-create destinations on-the-fly.
I generally stick to 'ON_DEMAND' for best performance. STRICT mode means that it will always distribute the messages, even if the other brokers don't have any consumers. Bear in mind that Artemis won't distribute messages for queues that have no consumers, even on STRICT mode.
Glad I could help, and good luck!
Hi @Unlocker. Actually that message is benign. Its Kubernetes Readiness Probe hitting the JMS transport connection every 10 seconds to test that the connection is up. It's annoying to show as an error, but there doesn't seem to be a better way to test readiness at the moment.
@DanSalt
Is there anything we can do about this? We're exporting stdout to kibana and it's a lot of logs to be shipping off for a readiness check.
Is there an actual health check on artemis that we can hit?
Could we do something like this: https://github.com/apache/activemq-artemis/blob/master/docs/user-manual/en/rest.md