Reconnect to Hazelcast cluster when network connection was interrupted
Hazelcast 3.11.1 and Csharp client 3.11 We have .NET HazelcastClient which establish connection to Hazelcast cluster In some stage during application lifetime we switch off network connection. After approximately 20 seconds we got exception Hazelcast Instance is not active. Then we restore network connection but it does not solve problem. Each try to obtain any data from hazelcast results with the same exception message. We try to set NetworkConfig parameters: ConnectionTimeout, ConnectionAttemptLimit, ConnectionAttemptPeriod but it does not change hazelcast behaviour. We also perform the same test with Java HazelcastClient with usage of ClientConnectionStrategyConfig and with this settings HazelcastClient was able to reconnect after lost of network connection. Is there any way to solve that problem?
The issue has been identified: when the network connection is down, the .net client is unable to resolve hostnames which generates an exception.
A workaround is to use IP addresses within the .net client config rather than hostnames but we will be issuing an updated .net client that resolves this issue.
Thank you for reporting it.
@jgardiner68 could you please provide an estimation when new version with this fix will be available? thanks
@asergeev95 the fix will be included in the next update of the .net client which is expected in the 2nd half of July
This has been causing some issues on our side of things as well with the exact same symptoms as @arkadiuszdr. Further to this I noticed that for the Java Client there is now a way to configure this settings: https://docs.hazelcast.org/docs/latest/manual/html-single/#configuring-client-connection-retry and I was wondering if we could use .xml alternative (as this feature is not listed at all in the supported ones list).
@jgardiner68 do you think your 2nd half of July is still valid? Would you require some help from my side?
@rjso This is on schedule for a release by the end of July. I will update this ticket when it is available and will be great to get your feedback.
Will be shipped in 3.12.1
Thanks @jgardiner68 and @Scooletz I will test it as soon as 3.12.1 becomes available. thanks for your help :)
Hi @jgardiner68, @Scooletz we have tried to confirm this issue has been resolved, and I can confirm it isn't, at least from my perspective.
I have a docker container that was started and was receiving messages in its queue. I then stopped the container and tried to add a new message to the queue. I can see the client trying to send the message for 30s before timing out with the following error "Queue is full!". I then started the docker container and created a new message. Almost immediately the message "Queue is full!" is thrown implying that no retry is made. Only restarting the application that uses the client solves the problem. ( I have also noticed on my container logs that no connection was established)
From my limited understanding something like what I mentioned in my first comment https://docs.hazelcast.org/docs/latest/manual/html-single/#configuring-client-connection-retry would help set retry policies.
Do you think it would be possible to investigate this matter further and potentially add the retry policy feature?
yes the issue as described by rjso, is still there,... if the client loose connection from all the members, and the hazelcast dns point to other IPs , this seems still to fail :(
@Scooletz Sorry this was a configuration problem (in my case). the Services in Kubernetes in this case must be without an headless IP clusterIP: "None" . Sorry for that my bad. I am using the DNS discovery.
Thank you for raising this @rjso & @tamademicheli Let me take a look at it. I'll share my findings and outcomes soon.
Hi. I discovered same problem today. Server - 3.12.9, Client 3.12.3 @arkadiuszdr were you able to work around it somehow?
Closed due to architectural change between 3.x and 4.x.