Windows Container TCP connection timeout after 10 minutes of idle
[!IMPORTANT] Migrating Discussions to Issues. All customer inquiries should be in Issues.
Discussed in https://github.com/microsoft/Windows-Containers/discussions/384
Originally posted by codeground123 June 15, 2023
We have a client running in a Windows Container in OpenShift. This client makes TCP Connections to another server.
After certain 10 minutes of Idle, the connection is getting timeout.
Not Sure how to debug this and I am not able to add any tcpdump inside the container.
I used "netstat" from the container to see the IP connection status
The source IP after certain amount of time goes to TIME_WAIT state. Initially it is in Established state.
Thank you for creating an Issue. Please note that GitHub is not an official channel for Microsoft support requests. To create an official support request, please open a ticket here. Microsoft and the GitHub Community strive to provide a best effort in answering questions and supporting Issues on GitHub.
I am seeing this as well. I can reproduce this after ~5 minutes of the TCP connection being kept idle. In my usecase its an LDAP connection to a Domain Controller.
On the wire I see bunch of retransmission failures before a new connection is established - probably because from the container's perspective there is no RST / the connection is valid.
Any insights would be helpful
@avin3sh, I'm assuming you're the client? Are you using l2bridge, or another network driver? Also, what is your expected behavior for the connection?
@adrianm-msft Yes I am the client and using l2bridge. Basically Calico and Kubernetes.
The problem here is that the client never sees TCP RST, so for couple of seconds it tries to retransmit the packet over the same connection - and finally after bunch of retransmission failures a new connection is established. This whole process adds a lot of unnecessary delay/wait to our workflow.
Also, since we are invoking winldap's LDAP calls (via System.DirectoryServices.Protocols in C#) - we actually do not have access to underlying TCP connection - so there isn't much we could do at the app level to workaround this. So, if there are any known workarounds that can be shared - that would be extremely helpful.
Yes I am the client and using l2bridge. Basically Calico and Kubernetes.
@avin3sh, are you using AKS?
@avin3sh if you experience the tcp connections drop after 4 minutes in idle state, the issue could be this one (https://github.com/microsoft/Windows-Containers/issues/269). The SNAT done by HNS in Windows has an idle timeout of 240 seconds that cannot be changed. In EKS, we solved this issue avoiding the SNAT at all for target CIDRs with a dedicated parameter.
The datapath for Windows pods in AKS allows an idle TCP connection/flow for a max of 4 minutes. After this period, the connection times out, and hence, the endpoints cannot communicate on that TCP connection. This is expected behavior.
The 4-minute timeout does not apply to all flows and depends on the scenario:
-
Scenario 1: When source NAT (PAT) occurs, the pod IP is replaced with the Node IP and a different port when the packet leaves the Node. This requires a flow state in the Node's datapath for reverse-NATting (reverse-PAT) the response packet. However, this flow times out in 4 minutes. After 4 minutes, the response packet (from server) is not recognizable (since flow state is no longer present) and hence is dropped.
-
Scenario 2: Both client and server are in the same subnet. There is an outbound NAT exception for packets destined to IPs within same subnet. Hence, no flow state is needed (because no NAT/PAT is done). So, the TCP connection will survive beyond 4 minutes of idle time.
@adrianm-msft to answer your earlier question, this is on-prem Kubernetes setup with Calico networking.
This is expected behavior.
Even if this 4-minute cutoff is expected, the issue here is that the application inside the container still thinks the connection is alive - the moral equivalent of TCP RST does not happen, this manifests as bunch of retransmission failures - which is visible in packet capture. This specific bit is likely not-expected/is undesired ?
@adrianm-msft let me know what you think about my last comment - does that behavior not classify as bug. Also I was keen getting a workaround. If getting this through Support would expedite it, I am willing to try that.
@ntrappe-msft @adrianm-msft curious to hear your thoughts on above. While 4 minute timeout is by design, the part where the client not does not get notified about the underlying connection getting closed (through TCP RST or other mechanism) should be treated as a bug.
@avin3sh, currently looking into this - will keep you posted!
@avin3sh, looks like two socket options could be relevant here:
- SO_KEEPALIVE configures TCP to keep a pulse of traffic flowing continuously, which keeps sessions alive on middleboxes. Also see TCP_KEEPIDLE and TCP_KEEPINTVL.
- TCP_MAXRT/TCP_MAXRTMS configures how many retransmits before TCP considers a connection closed.
Otherwise, while this isn't my area of expertise, from what I can tell, TCP is working as expected by default.
The problem is that if I set SO_KEEPALIVE to true (false by default), I still do not see keep-alive packets while the connection is idle.
I think that's because in Windows (Server 2022, at least) the global keep-alive timeout is 2hrs (see here). The doc says its configurable by HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters - but I had no success.
Since HNS closes connection after 4 minutes, even with SO_KEEPALIVE specified - the problem persists. Is there a way to set the global TCP Keep Alive timeout to a value other than the default 7,200,000 milliseconds (2hrs) ?
I do not have a lot of wiggle room here. I am using winldap Win32 API to make LDAP calls. It let's me configure SO_KEEPALIVE through LDAP_OPT_TCP_KEEPALIVE. I guess TCP_KEEPINTVL et al are relatively new which why winldap doesn't have the equivalent.
Using winldap means I do not have direct access to the underlying socket, so unfortunately I see the only option possible here being SO_KEEPALIVE + some way to bring down the global keep-alive timeout to under 240 seconds.
Looks like HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters is ignored in newer versions :
So, SO_KEEPALIVE alone won't be helpful
https://learn.microsoft.com/en-us/windows-server/networking/technologies/network-subsystem/net-sub-performance-tuning-nics#deprecated-tcp-parameters
@avin3sh, have you had the chance to try out any of the other socket options?
I am limited by what winldap exposes - LDAP Session Options - it looks like, of all the options you mentioned, the only thing it exposes is LDAP_OPT_TCP_KEEPALIVE which maps to SO_KEEPALIVE.
Actually, it looks like the SO_KEEPALIVE socket option is still supported and doesn't depend on registry keys. The socket option (in C) would be set like this:
if(g_keepalives) {
int opt = 1;
optlen = sizeof(opt);
setsockopt(g_listen_socket,SOL_SOCKET,SO_KEEPALIVE,(char*)&opt,optlen);
}
and most socket libraries have similar knobs, e.g. StreamSocketControl.KeepAlive in C#, that ultimately set this same option.
Agreed. But, after enabling this option - you will not see keep-alive probe packets until 2hrs have elapsed because thats what windows default keep-alive interval is.
It doesn't look like we support any global configuration knobs for keepalives any more - everything is configured per socket by the application.
The only reason why this particular app needs Windows containers is because it relies on WinLdap / Active Directory - which do not expose socket keepalive. It's bit of a bummer. We are not on AWS so we don't have option of SNAT exception like VPC CNI. Is there any other workaround that we could explore ?
If I'm not wrong, since the end of last year. in AKS is possible to disable outbound NAT at all using "--disable-windows-outbound-nat" flag. Maybe you can have a try and, if SNAT is needed for the routing towards some endpoints, you can use a NAT gateway where you can extend the idle timeout from 4 to 120 minutes
This issue has been open for 30 days with no updates. @adrianm-msft, please provide an update or close this issue.
Let's please keep this issue open. I would like this to be supported. I believe support for LDAP/Active Directory is a critical usecase for Windows Containers - which is impacted by the current behaviour.
I also think the application inside container not receiving TCP-RST is a bug/unexpected flow, from application point of view.
Most of the workarounds seem to rely on specific cloud CNI implementations, which is not feasible for on-prem usecases.
This issue has been open for 30 days with no updates. @adrianm-msft, please provide an update or close this issue.
This issue has been open for 30 days with no updates. @adrianm-msft, please provide an update or close this issue.
This issue has been open for 30 days with no updates. @grcusanz, please provide an update or close this issue.
This issue has been open for 30 days with no updates. @grcusanz, please provide an update or close this issue.
This issue has been open for 30 days with no updates. @grcusanz, please provide an update or close this issue.
This issue has been open for 30 days with no updates. @grcusanz, please provide an update or close this issue.