hazelcast-kubernetes
hazelcast-kubernetes copied to clipboard
Weird behavior of TcpServerConnection
Hello guys!
I rather need a clarification, but not a help. Currently I'm trying to run application in K8s and everything is good so far, but I have a little bit weird logs:
2021-04-08 13:51:59.890 INFO 1 --- [.IO.thread-in-0] c.h.i.server.tcp.TcpServerConnection : [10.154.24.57]:5701 [WORKFLOW] [4.2] Connection[id=10, /10.154.24.57:5701->/10.154.1.219:15106, qualifier=null, endpoint=null, alive=false, connectionType=NONE, planeIndex=-1] closed. Reason: Connection closed by the other side
2021-04-08 13:51:59.890 INFO 1 --- [.IO.thread-in-1] c.h.i.server.tcp.TcpServerConnection : [10.154.24.57]:5701 [WORKFLOW] [4.2] Connection[id=11, /10.154.24.57:5701->/10.154.47.197:32686, qualifier=null, endpoint=null, alive=false, connectionType=NONE, planeIndex=-1] closed. Reason: Connection closed by the other side
2021-04-08 13:51:59.890 INFO 1 --- [.IO.thread-in-2] c.h.i.server.tcp.TcpServerConnection : [10.154.24.57]:5701 [WORKFLOW] [4.2] Connection[id=12, /10.154.24.57:5701->/10.154.5.157:10454, qualifier=null, endpoint=null, alive=false, connectionType=NONE, planeIndex=-1] closed. Reason: Connection closed by the other side
2021-04-08 13:52:00.911 INFO 1 --- [ration.thread-0] c.h.internal.cluster.ClusterService : [10.154.24.57]:5701 [WORKFLOW] [4.2]
Members {size:2, ver:2} [
Member [10.154.24.57]:5701 - 94f6d56b-a72d-4d0a-82ee-a33e47053f78 this
Member [10.154.92.185]:5701 - 1f66b061-f995-46c0-b886-dd0ee1a49dc3
]
2021-04-08 13:52:04.335 INFO 1 --- [.IO.thread-in-0] c.h.i.server.tcp.TcpServerConnection : [10.154.24.57]:5701 [WORKFLOW] [4.2] Connection[id=13, /10.154.24.57:5701->/10.154.6.102:16924, qualifier=null, endpoint=null, alive=false, connectionType=NONE, planeIndex=-1] closed. Reason: Connection closed by the other side
2021-04-08 13:52:04.335 INFO 1 --- [.IO.thread-in-1] c.h.i.server.tcp.TcpServerConnection : [10.154.24.57]:5701 [WORKFLOW] [4.2] Connection[id=14, /10.154.24.57:5701->/10.154.19.176:5548, qualifier=null, endpoint=null, alive=false, connectionType=NONE, planeIndex=-1] closed. Reason: Connection closed by the other side
2021-04-08 13:52:04.336 INFO 1 --- [.IO.thread-in-2] c.h.i.server.tcp.TcpServerConnection : [10.154.24.57]:5701 [WORKFLOW] [4.2] Connection[id=15, /10.154.24.57:5701->/10.154.5.124:47016, qualifier=null, endpoint=null, alive=false, connectionType=NONE, planeIndex=-1] closed. Reason: Connection closed by the other side
2021-04-08 13:52:04.336 INFO 1 --- [.IO.thread-in-0] c.h.i.server.tcp.TcpServerConnection : [10.154.24.57]:5701 [WORKFLOW] [4.2] Connection[id=16, /10.154.24.57:5701->/10.154.1.219:52814, qualifier=null, endpoint=null, alive=false, connectionType=NONE, planeIndex=-1] closed. Reason: Connection closed by the other side
2021-04-08 13:52:09.889 INFO 1 --- [.IO.thread-in-2] c.h.i.server.tcp.TcpServerConnection : [10.154.24.57]:5701 [WORKFLOW] [4.2] Connection[id=18, /10.154.24.57:5701->/10.154.19.176:32638, qualifier=null, endpoint=null, alive=false, connectionType=NONE, planeIndex=-1] closed. Reason: Connection closed by the other side
2021-04-08 13:52:09.889 INFO 1 --- [.IO.thread-in-1] c.h.i.server.tcp.TcpServerConnection : [10.154.24.57]:5701 [WORKFLOW] [4.2] Connection[id=17, /10.154.24.57:5701->/10.154.29.156:1518, qualifier=null, endpoint=null, alive=false, connectionType=NONE, planeIndex=-1] closed. Reason: Connection closed by the other side
2021-04-08 13:52:09.889 INFO 1 --- [.IO.thread-in-0] c.h.i.server.tcp.TcpServerConnection : [10.154.24.57]:5701 [WORKFLOW] [4.2] Connection[id=19, /10.154.24.57:5701->/10.154.47.197:32712, qualifier=null, endpoint=null, alive=false, connectionType=NONE, planeIndex=-1] closed. Reason: Connection closed by the other side
It seems like current node constantly trying to establish connection with other pods "using all variety of irrelevant ports". Is it normal behavior or not? Is there a way to restrict it somehow?
My configs:
@Bean
public Config hazelcastConfig() {
Config cfg = new Config();
cfg.setClusterName("WORKFLOW");
cfg.getNetworkConfig().setPort(5701).setPortAutoIncrement(false);
cfg.setSerializationConfig(serializationConfig());
cfg.setMapConfigs(getMapConfigs());
cfg.setProperty("hazelcast.logging.type", "slf4j");
if (isApplicationInsideKube()) {
JoinConfig joinConfig = cfg.getNetworkConfig().getJoin();
joinConfig.getAutoDetectionConfig().setEnabled(false);
joinConfig.getTcpIpConfig().setEnabled(false);
joinConfig.getMulticastConfig().setEnabled(false);
joinConfig.getKubernetesConfig().setEnabled(true)
.setProperty("service-name", "workflow-service")
.setProperty("service-port", "5701");
log.info(joinConfig.toString());
}
log.info(cfg.toString());
return cfg;
}
Best regards & Thank you in advance,
@tessob thanks for submitting the issue. Could you please tell us more about your k8s environment? Are there any proxy/service mesh setup?
@alparslanavci no, we have no proxy/service mesh. Our Service manifest is similar to this one. There are just two pods in separate namespace now. However cluster itself is bigger.
Our JoinConfig on application start looks like this:
JoinConfig{multicastConfig=MulticastConfig [enabled=false, multicastGroup=224.2.2.3, multicastPort=54327, multicastTimeToLive=32, multicastTimeoutSeconds=2, trustedInterfaces=[], loopbackModeEnabled=true], tcpIpConfig=TcpIpConfig [enabled=false, connectionTimeoutSeconds=5, members=[], requiredMember=null], awsConfig=AliasedDiscoveryConfig{tag='aws', enabled=false, usePublicIp=false, properties={}}, gcpConfig=AliasedDiscoveryConfig{tag='gcp', enabled=false, usePublicIp=false, properties={}}, azureConfig=AliasedDiscoveryConfig{tag='azure', enabled=false, usePublicIp=false, properties={}}, kubernetesConfig=AliasedDiscoveryConfig{tag='kubernetes', enabled=true, usePublicIp=false, properties={service-name=workflow-service, service-port=5701}}, eurekaConfig=AliasedDiscoveryConfig{tag='eureka', enabled=false, usePublicIp=false, properties={}}, discoveryConfig=DiscoveryConfig{discoveryStrategyConfigs=[], discoveryServiceProvider=null, nodeFilter=null, nodeFilterClass='null'}, autoDetectionConfig=AutoDetectionConfig{enabled=false}}
and Discovery properties like this:
Kubernetes Discovery properties: { service-dns: null, service-dns-timeout: 5, service-name: workflow-service, service-port: 5701, service-label: null, service-label-value: true, namespace: XXXXX, pod-label: null, pod-label-value: null, resolve-not-ready-addresses: true, use-node-name-as-external-address: false, kubernetes-api-retries: 3, kubernetes-master: https://kubernetes.default.svc}
and there are few additional messages that I don't realy know hot to interprete:
2021-04-08 13:51:46.207 INFO 1 --- [ main] c.h.s.d.integration.DiscoveryService : [10.154.24.57]:5701 [WORKFLOW] [4.2] Kubernetes Discovery activated with mode: KUBERNETES_API
2021-04-08 13:51:46.499 INFO 1 --- [ main] com.hazelcast.instance.impl.Node : [10.154.24.57]:5701 [WORKFLOW] [4.2] Using Discovery SPI
2021-04-08 13:51:47.783 INFO 1 --- [ main] com.hazelcast.core.LifecycleService : [10.154.24.57]:5701 [WORKFLOW] [4.2] [10.154.24.57]:5701 is STARTING
2021-04-08 13:51:48.603 INFO 1 --- [ main] c.h.s.d.integration.DiscoveryService : [10.154.24.57]:5701 [WORKFLOW] [4.2] Kubernetes plugin discovered availability zone: eu-XXXXXX
2021-04-08 13:51:48.771 INFO 1 --- [ main] c.h.s.d.integration.DiscoveryService : [10.154.24.57]:5701 [WORKFLOW] [4.2] Kubernetes plugin discovered node name: ip-XXXXXXX.compute.internal
2021-04-08 13:51:48.990 WARN 1 --- [ main] c.hazelcast.kubernetes.KubernetesClient : Cannot fetch public IPs of Hazelcast Member PODs, you won't be able to use Hazelcast Smart Client from outside of the Kubernetes network
I hope it would be helpful. BR,
Short update,
I have checked out morning logs with 20K attempts to find other nodes and then taken a look at source codes to comprehend where the client has got the list of irrelevant endpoints.
I reproduced this API call manually with the following curl
query:
kubectl proxy --port=0 &
curl http://localhost:36891/api/v1/namespaces/XXXXXXXXX/endpoints/workflow-service
and got this output:
{
"kind": "Endpoints",
"apiVersion": "v1",
"metadata": {
"name": "workflow-service",
"namespace": "XXXXXXXXX",
"selfLink": "/api/v1/namespaces/XXXXXXXXX/endpoints/workflow-service",
"uid": "64ffb585-e68a-41bc-97bf-fb54e8f3fdc1",
"resourceVersion": "113176905",
"creationTimestamp": "2021-04-08T11:01:06Z",
"labels": {
"fluxcd.io/sync-gc-mark": "sha256.VJun3ud720pxL2mM3fjAbo3OoiEfk8bQDeZA8wZQx7c"
}
},
"subsets": [
{
"addresses": [
{
"ip": "10.154.24.57",
"nodeName": "XXXXXXXXX.compute.internal",
"targetRef": {
"kind": "Pod",
"namespace": "XXXXXXXXX",
"name": "workflow-tracker-service-5769dc77cf-9rnhp",
"uid": "7c84a0c5-7d9b-4cb9-b1d8-d6bb0d08979e",
"resourceVersion": "113176843"
}
},
{
"ip": "10.154.92.185",
"nodeName": "XXXXXXXXX.compute.internal",
"targetRef": {
"kind": "Pod",
"namespace": "XXXXXXXXX",
"name": "workflow-tracker-service-5769dc77cf-zf9pd",
"uid": "29737169-ff5e-49ba-902f-1756dc4e33db",
"resourceVersion": "113176895"
}
}
],
"ports": [
{
"name": "workflow-s",
"port": 8080,
"protocol": "TCP"
},
{
"name": "hazelcast",
"port": 5701,
"protocol": "TCP"
}
]
}
]
}
So, as I told, I have just 2 pods relevant to the Hazelcast cluster. Hence I don't really understand what is the source IPs that TcpServerConnection
is trying to connect. Looks like a bug.
Best regards,
It's hard to tell where Hazelcast tries to connect to. Some more questions:
- If you follow exactly the code in this guide, do you experience the same?
- If you deploy your application on some widely-used Kubernetes cluster (like Docker Desktop, or GKE, or EKS), do you experience the same?
Hi @leszko,
- We have tested as in guide, in the same k8s cluster and namespace, and everything is OK. Logs:
########################################
# JAVA_OPTS=-Djava.net.preferIPv4Stack=true -Dhazelcast.logging.type=log4j2 -Dlog4j.configurationFile=/opt/hazelcast/log4j2.properties -XX:MaxRAMPercentage=80.0 -XX:MaxGCPauseMillis=5 --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED -Dhazelcast.config=/data/hazelcast/hazelcast.yaml -DserviceName=hazelcast -Dnamespace=XXXXXXXXX -Dhazelcast.shutdownhook.policy=GRACEFUL -Dhazelcast.shutdownhook.enabled=true -Dhazelcast.graceful.shutdown.max.wait=600
# CLASSPATH=/opt/hazelcast/*:/opt/hazelcast/lib/*
# starting now....
########################################
+ exec java -server -Djava.net.preferIPv4Stack=true -Dhazelcast.logging.type=log4j2 -Dlog4j.configurationFile=/opt/hazelcast/log4j2.properties -XX:MaxRAMPercentage=80.0 -XX:MaxGCPauseMillis=5 --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED -Dhazelcast.config=/data/hazelcast/hazelcast.yaml -DserviceName=hazelcast -Dnamespace=XXXXXXXXX -Dhazelcast.shutdownhook.policy=GRACEFUL -Dhazelcast.shutdownhook.enabled=true -Dhazelcast.graceful.shutdown.max.wait=600 com.hazelcast.core.server.HazelcastMemberStarter
2021-04-09 10:39:31,214 [[32m INFO[m] [[36mmain[m] [[34mc.h.i.c.AbstractConfigLocator[m]: Loading configuration '/data/hazelcast/hazelcast.yaml' from System property 'hazelcast.config'
2021-04-09 10:39:31,219 [[32m INFO[m] [[36mmain[m] [[34mc.h.i.c.AbstractConfigLocator[m]: Using configuration file at /data/hazelcast/hazelcast.yaml
2021-04-09 10:39:31,360 [[32m INFO[m] [[36mmain[m] [[34mc.h.system[m]: [10.154.60.11]:5701 [dev] [4.2] Hazelcast 4.2 (20210324 - 405cfd1) starting at [10.154.60.11]:5701
2021-04-09 10:39:31,749 [[32m INFO[m] [[36mmain[m] [[34mc.h.s.d.i.DiscoveryService[m]: [10.154.60.11]:5701 [dev] [4.2] Kubernetes Discovery properties: { service-dns: null, service-dns-timeout: 5, service-name: hazelcast, service-port: 0, service-label: null, service-label-value: true, namespace: XXXXXXXXX, pod-label: null, pod-label-value: null, resolve-not-ready-addresses: true, use-node-name-as-external-address: false, kubernetes-api-retries: 3, kubernetes-master: https://kubernetes.default.svc}
2021-04-09 10:39:31,752 [[32m INFO[m] [[36mmain[m] [[34mc.h.s.d.i.DiscoveryService[m]: [10.154.60.11]:5701 [dev] [4.2] Kubernetes Discovery activated with mode: KUBERNETES_API
2021-04-09 10:39:31,807 [[32m INFO[m] [[36mmain[m] [[34mc.h.i.i.Node[m]: [10.154.60.11]:5701 [dev] [4.2] Using Discovery SPI
2021-04-09 10:39:31,810 [[33m WARN[m] [[36mmain[m] [[34mc.h.c.CPSubsystem[m]: [10.154.60.11]:5701 [dev] [4.2] CP Subsystem is not enabled. CP data structures will operate in UNSAFE mode! Please note that UNSAFE mode will not provide strong consistency guarantees.
2021-04-09 10:39:32,024 [[32m INFO[m] [[36mmain[m] [[34mc.h.i.d.Diagnostics[m]: [10.154.60.11]:5701 [dev] [4.2] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.
2021-04-09 10:39:32,035 [[32m INFO[m] [[36mmain[m] [[34mc.h.c.LifecycleService[m]: [10.154.60.11]:5701 [dev] [4.2] [10.154.60.11]:5701 is STARTING
2021-04-09 10:39:32,448 [[32m INFO[m] [[36mmain[m] [[34mc.h.s.d.i.DiscoveryService[m]: [10.154.60.11]:5701 [dev] [4.2] Kubernetes plugin discovered availability zone: XXXXXXXXX
2021-04-09 10:39:32,500 [[32m INFO[m] [[36mmain[m] [[34mc.h.s.d.i.DiscoveryService[m]: [10.154.60.11]:5701 [dev] [4.2] Kubernetes plugin discovered node name: XXXXXXXXX.compute.internal
2021-04-09 10:39:32,616 [[33m WARN[m] [[36mmain[m] [[34mc.h.k.KubernetesClient[m]: Cannot fetch public IPs of Hazelcast Member PODs, you won't be able to use Hazelcast Smart Client from outside of the Kubernetes network
2021-04-09 10:39:37,600 [[32m INFO[m] [[36mmain[m] [[34mc.h.i.c.ClusterService[m]: [10.154.60.11]:5701 [dev] [4.2]
Members {size:1, ver:1} [
Member [10.154.60.11]:5701 - 6e43ac32-e6bf-4cfb-81e0-a865d6f986b1 this
]
2021-04-09 10:39:37,611 [[32m INFO[m] [[36mmain[m] [[34mc.h.c.LifecycleService[m]: [10.154.60.11]:5701 [dev] [4.2] [10.154.60.11]:5701 is STARTED
2021-04-09 10:40:22,454 [[32m INFO[m] [[36mhz.boring_ganguly.IO.thread-in-1[m] [[34mc.h.i.s.t.TcpServerConnection[m]: [10.154.60.11]:5701 [dev] [4.2] Initialized new cluster connection between /10.154.60.11:5701 and /10.154.107.219:57097
2021-04-09 10:40:28,455 [[32m INFO[m] [[36mhz.boring_ganguly.priority-generic-operation.thread-0[m] [[34mc.h.i.c.ClusterService[m]: [10.154.60.11]:5701 [dev] [4.2]
Members {size:2, ver:2} [
Member [10.154.60.11]:5701 - 6e43ac32-e6bf-4cfb-81e0-a865d6f986b1 this
Member [10.154.107.219]:5701 - 3140ed16-8832-4e69-a875-d8da79b091eb
]
2021-04-09 10:40:56,850 [[32m INFO[m] [[36mhz.boring_ganguly.IO.thread-in-0[m] [[34mc.h.i.s.t.TcpServerConnection[m]: [10.154.60.11]:5701 [dev] [4.2] Initialized new cluster connection between /10.154.60.11:5701 and /10.154.101.83:44681
2021-04-09 10:41:02,834 [[32m INFO[m] [[36mhz.boring_ganguly.priority-generic-operation.thread-0[m] [[34mc.h.i.c.ClusterService[m]: [10.154.60.11]:5701 [dev] [4.2]
Members {size:3, ver:3} [
Member [10.154.60.11]:5701 - 6e43ac32-e6bf-4cfb-81e0-a865d6f986b1 this
Member [10.154.107.219]:5701 - 3140ed16-8832-4e69-a875-d8da79b091eb
Member [10.154.101.83]:5701 - 21669cbd-71fc-4a70-8727-b6538ddd3d23
]
- We are in AWS EKS
In my dependencies I have only these two that can be related to Hazelcast:
<dependency>
<artifactId>spring-boot-starter-cache</artifactId>
<groupId>org.springframework.boot</groupId>
</dependency>
<!-- https://mvnrepository.com/artifact/com.hazelcast/hazelcast-all -->
<dependency>
<groupId>com.hazelcast</groupId>
<artifactId>hazelcast-all</artifactId>
<version>4.2</version>
</dependency>
My config is in very first message.
Thanks,
Short update,
I have tested my application in the minikube locally and everything is OK. So the root-cause of such behavior highly likely in K8s cluster and/or deployment manifests. However, for me still unclear why Hazelcast is trying to establish connection with nodes that are irrelevant to service-name.
It may be the phone home call. Could you try disabling phone home and check if it helps?
After I disabled it I still have the same behavior in logs, so there should be another cause.
Hi @tessob , were you able to find out the cause? I'm currently experiencing the same issue. I'm also in AWS EKS
I'm also experiencing the same problem and it is also in AWS EKS.
Hi @tessob , were you able to find out the cause? I'm currently experiencing the same issue. I'm also in AWS EKS
Sorry for A late reply )) I have migrated to Infinispan.
Hey folks,
I am facing same issue . We have java spring app, with embedded hazelcast, and run it in Kubernetes. For member lookup we using DNS Lookup Discovery mode , member creation looks good but Hazelcast trying to connect some random IP and giving below warning.
java.lang.IllegalStateException: Unknown protocol: OPT java.lang.IllegalStateException: TLS handshake header detected, but plain protocol header was expected.
@tessob Did you find the root cause of this? My issue is similar to what you have posted .
@AshishWat No, just forget about Hazelcast after code observation and moved to Infinispan. So far so good.