zookeeper icon indicating copy to clipboard operation
zookeeper copied to clipboard

ConnectionReset exception

Open HolyPrapor opened this issue 5 years ago • 5 comments

This client was used for a long time on Windows without any issues. A couple of month ago we tried to use this client on .NET Core and we tested it on Linux and Windows.

In our project we use ZooKeeperClient a lot to read nodes and set watchers.

Windows version works flawlessly. However, Linux version causes Connection reset by peer exception. I investigated this problem and read Zookeeper logs. I found out that Zookeeper didn't reset it's connection. I didn't capture any tcp dumps, but I'm pretty sure there are no TCP RST packets.

Upgrading to .NET 5 makes the situation even worse. (ConnectionLossExceptions appear more often).

I decided to go deeper into the ZooKeeperClient code. I found a check which causes false-detected connection loss.

Unfortunately, I was not able to detect what causes this effect and how to reproduce this problem. Looks like a problem with sockets on Linux.

Removing this check solves the problem.

Also, this client sends KeepAlive pings anyway, so if there IS a real connection loss, we will know about it in a soon time (either next time we try to send something or next ping).

HolyPrapor avatar Dec 29 '20 08:12 HolyPrapor

According to SO the most proper way to check if a socket is connected is to check if there any bytes available to read and call Poll method of the socket. This PR resolves the issue with sockets.

HolyPrapor avatar Dec 30 '20 04:12 HolyPrapor

Been having the same problem, org.apache.zookeeper.KeeperException.ConnectionLossException started to appear more frequently in .netcore3, but when trying to upgrade to .NET5 I get it all the time.

Using MacOS and Big Sur.

MatsKarlsson avatar Jan 11 '21 12:01 MatsKarlsson

I am afraid we also started facing the same issue, would be nice if PR is reviewed and released if it solves the issue.

kuskmen avatar Aug 23 '21 08:08 kuskmen

We upgraded to .NET 6 from .NET Core 3.1 and see this very frequently now when running within a linux docker container.

douggish avatar Mar 21 '22 21:03 douggish

Can the code be changed to follow the guidance from the MSFT docs for checking connected?

// .Connect throws an exception if unsuccessful
client.Connect(anEndPoint);

// This is how you can determine whether a socket is still connected.
bool blockingState = client.Blocking;
try
{
    byte [] tmp = new byte[1];

    client.Blocking = false;
    client.Send(tmp, 0, 0);
    Console.WriteLine("Connected!");
}
catch (SocketException e)
{
    // 10035 == WSAEWOULDBLOCK
    if (e.NativeErrorCode.Equals(10035))
    {
        Console.WriteLine("Still Connected, but the Send would block");
    }
    else
    {
        Console.WriteLine("Disconnected: error code {0}!", e.NativeErrorCode);
    }
}
finally
{
    client.Blocking = blockingState;
}

Console.WriteLine("Connected: {0}", client.Connected);

madelson avatar Dec 01 '22 12:12 madelson