Windows-Containers icon indicating copy to clipboard operation
Windows-Containers copied to clipboard

Windows Container TCP connection timeout after 10 minutes of idle

Open ntrappe-msft opened this issue 1 year ago • 31 comments

[!IMPORTANT] Migrating Discussions to Issues. All customer inquiries should be in Issues.

Discussed in https://github.com/microsoft/Windows-Containers/discussions/384

Originally posted by codeground123 June 15, 2023

We have a client running in a Windows Container in OpenShift. This client makes TCP Connections to another server.

After certain 10 minutes of Idle, the connection is getting timeout.

Not Sure how to debug this and I am not able to add any tcpdump inside the container.

I used "netstat" from the container to see the IP connection status

The source IP after certain amount of time goes to TIME_WAIT state. Initially it is in Established state.

ntrappe-msft avatar Dec 10 '24 21:12 ntrappe-msft

Thank you for creating an Issue. Please note that GitHub is not an official channel for Microsoft support requests. To create an official support request, please open a ticket here. Microsoft and the GitHub Community strive to provide a best effort in answering questions and supporting Issues on GitHub.

github-actions[bot] avatar Dec 10 '24 21:12 github-actions[bot]

I am seeing this as well. I can reproduce this after ~5 minutes of the TCP connection being kept idle. In my usecase its an LDAP connection to a Domain Controller.

On the wire I see bunch of retransmission failures before a new connection is established - probably because from the container's perspective there is no RST / the connection is valid.

Any insights would be helpful

avin3sh avatar Dec 12 '24 03:12 avin3sh

@avin3sh, I'm assuming you're the client? Are you using l2bridge, or another network driver? Also, what is your expected behavior for the connection?

adrianm-msft avatar Dec 18 '24 19:12 adrianm-msft

@adrianm-msft Yes I am the client and using l2bridge. Basically Calico and Kubernetes.

The problem here is that the client never sees TCP RST, so for couple of seconds it tries to retransmit the packet over the same connection - and finally after bunch of retransmission failures a new connection is established. This whole process adds a lot of unnecessary delay/wait to our workflow.

avin3sh avatar Dec 21 '24 02:12 avin3sh

Also, since we are invoking winldap's LDAP calls (via System.DirectoryServices.Protocols in C#) - we actually do not have access to underlying TCP connection - so there isn't much we could do at the app level to workaround this. So, if there are any known workarounds that can be shared - that would be extremely helpful.

avin3sh avatar Dec 21 '24 17:12 avin3sh

Yes I am the client and using l2bridge. Basically Calico and Kubernetes.

@avin3sh, are you using AKS?

adrianm-msft avatar Jan 07 '25 20:01 adrianm-msft

@avin3sh if you experience the tcp connections drop after 4 minutes in idle state, the issue could be this one (https://github.com/microsoft/Windows-Containers/issues/269). The SNAT done by HNS in Windows has an idle timeout of 240 seconds that cannot be changed. In EKS, we solved this issue avoiding the SNAT at all for target CIDRs with a dedicated parameter.

orsosamuele avatar Jan 10 '25 13:01 orsosamuele

The datapath for Windows pods in AKS allows an idle TCP connection/flow for a max of 4 minutes. After this period, the connection times out, and hence, the endpoints cannot communicate on that TCP connection. This is expected behavior.

The 4-minute timeout does not apply to all flows and depends on the scenario:

  • Scenario 1: When source NAT (PAT) occurs, the pod IP is replaced with the Node IP and a different port when the packet leaves the Node. This requires a flow state in the Node's datapath for reverse-NATting (reverse-PAT) the response packet. However, this flow times out in 4 minutes. After 4 minutes, the response packet (from server) is not recognizable (since flow state is no longer present) and hence is dropped.

  • Scenario 2: Both client and server are in the same subnet. There is an outbound NAT exception for packets destined to IPs within same subnet. Hence, no flow state is needed (because no NAT/PAT is done). So, the TCP connection will survive beyond 4 minutes of idle time.

adrianm-msft avatar Jan 10 '25 17:01 adrianm-msft

@adrianm-msft to answer your earlier question, this is on-prem Kubernetes setup with Calico networking.

This is expected behavior.

Even if this 4-minute cutoff is expected, the issue here is that the application inside the container still thinks the connection is alive - the moral equivalent of TCP RST does not happen, this manifests as bunch of retransmission failures - which is visible in packet capture. This specific bit is likely not-expected/is undesired ?

avin3sh avatar Jan 11 '25 17:01 avin3sh

@adrianm-msft let me know what you think about my last comment - does that behavior not classify as bug. Also I was keen getting a workaround. If getting this through Support would expedite it, I am willing to try that.

avin3sh avatar Jan 24 '25 18:01 avin3sh

@ntrappe-msft @adrianm-msft curious to hear your thoughts on above. While 4 minute timeout is by design, the part where the client not does not get notified about the underlying connection getting closed (through TCP RST or other mechanism) should be treated as a bug.

avin3sh avatar Feb 11 '25 14:02 avin3sh

@avin3sh, currently looking into this - will keep you posted!

adrianm-msft avatar Feb 11 '25 17:02 adrianm-msft

@avin3sh, looks like two socket options could be relevant here:

  1. SO_KEEPALIVE configures TCP to keep a pulse of traffic flowing continuously, which keeps sessions alive on middleboxes. Also see TCP_KEEPIDLE and TCP_KEEPINTVL.
  2. TCP_MAXRT/TCP_MAXRTMS configures how many retransmits before TCP considers a connection closed.

Otherwise, while this isn't my area of expertise, from what I can tell, TCP is working as expected by default.

adrianm-msft avatar Feb 11 '25 22:02 adrianm-msft

The problem is that if I set SO_KEEPALIVE to true (false by default), I still do not see keep-alive packets while the connection is idle. I think that's because in Windows (Server 2022, at least) the global keep-alive timeout is 2hrs (see here). The doc says its configurable by HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters - but I had no success.

Since HNS closes connection after 4 minutes, even with SO_KEEPALIVE specified - the problem persists. Is there a way to set the global TCP Keep Alive timeout to a value other than the default 7,200,000 milliseconds (2hrs) ?

I do not have a lot of wiggle room here. I am using winldap Win32 API to make LDAP calls. It let's me configure SO_KEEPALIVE through LDAP_OPT_TCP_KEEPALIVE. I guess TCP_KEEPINTVL et al are relatively new which why winldap doesn't have the equivalent.

Using winldap means I do not have direct access to the underlying socket, so unfortunately I see the only option possible here being SO_KEEPALIVE + some way to bring down the global keep-alive timeout to under 240 seconds.

avin3sh avatar Feb 12 '25 15:02 avin3sh

Looks like HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters is ignored in newer versions :
So, SO_KEEPALIVE alone won't be helpful

https://learn.microsoft.com/en-us/windows-server/networking/technologies/network-subsystem/net-sub-performance-tuning-nics#deprecated-tcp-parameters

avin3sh avatar Feb 12 '25 15:02 avin3sh

@avin3sh, have you had the chance to try out any of the other socket options?

adrianm-msft avatar Feb 12 '25 18:02 adrianm-msft

I am limited by what winldap exposes - LDAP Session Options - it looks like, of all the options you mentioned, the only thing it exposes is LDAP_OPT_TCP_KEEPALIVE which maps to SO_KEEPALIVE.

avin3sh avatar Feb 12 '25 18:02 avin3sh

Actually, it looks like the SO_KEEPALIVE socket option is still supported and doesn't depend on registry keys. The socket option (in C) would be set like this:

if(g_keepalives) {
    int opt = 1;
    optlen = sizeof(opt);
    setsockopt(g_listen_socket,SOL_SOCKET,SO_KEEPALIVE,(char*)&opt,optlen);
}

and most socket libraries have similar knobs, e.g. StreamSocketControl.KeepAlive in C#, that ultimately set this same option.

adrianm-msft avatar Feb 12 '25 19:02 adrianm-msft

Agreed. But, after enabling this option - you will not see keep-alive probe packets until 2hrs have elapsed because thats what windows default keep-alive interval is.

avin3sh avatar Feb 12 '25 19:02 avin3sh

It doesn't look like we support any global configuration knobs for keepalives any more - everything is configured per socket by the application.

adrianm-msft avatar Feb 26 '25 19:02 adrianm-msft

The only reason why this particular app needs Windows containers is because it relies on WinLdap / Active Directory - which do not expose socket keepalive. It's bit of a bummer. We are not on AWS so we don't have option of SNAT exception like VPC CNI. Is there any other workaround that we could explore ?

avin3sh avatar Mar 10 '25 14:03 avin3sh

If I'm not wrong, since the end of last year. in AKS is possible to disable outbound NAT at all using "--disable-windows-outbound-nat" flag. Maybe you can have a try and, if SNAT is needed for the routing towards some endpoints, you can use a NAT gateway where you can extend the idle timeout from 4 to 120 minutes

orsosamuele avatar Mar 12 '25 13:03 orsosamuele

This issue has been open for 30 days with no updates. @adrianm-msft, please provide an update or close this issue.

Let's please keep this issue open. I would like this to be supported. I believe support for LDAP/Active Directory is a critical usecase for Windows Containers - which is impacted by the current behaviour.

I also think the application inside container not receiving TCP-RST is a bug/unexpected flow, from application point of view.

Most of the workarounds seem to rely on specific cloud CNI implementations, which is not feasible for on-prem usecases.

avin3sh avatar Apr 16 '25 15:04 avin3sh

This issue has been open for 30 days with no updates. @adrianm-msft, please provide an update or close this issue.

This issue has been open for 30 days with no updates. @adrianm-msft, please provide an update or close this issue.

This issue has been open for 30 days with no updates. @grcusanz, please provide an update or close this issue.

This issue has been open for 30 days with no updates. @grcusanz, please provide an update or close this issue.

This issue has been open for 30 days with no updates. @grcusanz, please provide an update or close this issue.

This issue has been open for 30 days with no updates. @grcusanz, please provide an update or close this issue.