mptcp
mptcp copied to clipboard
TCP collapse in full duplex connections with MPTCP enabled
TL;DR I found that MPTCP behaves differently than legacy TCP for duplex TCP connections: under some circumstances the connection completely stalls when sending data in both directions.
Hi all,
I have a specific use case where I am transferring big amount of data over the same TCP connection in both directions. I am experiencing a full connection collapse after both directions are reaching high throughput; the connection never recovers.
A consistent reduction in throughput is expected when both direction are carrying full payload data (due to ACKs being piggybacked more slowly), and when setting rmem and wmem maximum values to more than the default Linux ones the TCP flow is more likely to collapse. Specifically, I grew, in both endpoints, the rmem max value to 8MB and wmem to 6MB (from the 6MB/4MB default values respectively).
So far, so good: this behaviour is expected. However, when enabling MPTCP with identical conditions (single subflow, same windows and CC algorithm) it exhibits different symptoms: throughput drops to zero and the connection stalls.
My impression is that MPTCP is not handling correctly this use case (i.e., full duplex TCP).
The environment I am testing consists of two Debian Jessie VMs, where only one of the two is multi-homed with two access links. The connection goes over the Internet, and RTT is quite big, ~130ms. VM1 is deployed in a KVM host, VM2 is an Amazon EC2 instance. Aggregated capacity in VM1 is 1.2Gbps (1x200Mbps link + 1x1Gbps link) where the Amazon instance can get up to 2Gbps over a single link. Both interfaces in VM1 have public IPv4 configured. VM2 in Amazon is behind a NAT.
I can reproduce the misbehaviour with both MPTCP kernel v0.94 and v0.93 directly installed from the apt repo. I can reproduce the collapse with both fullmesh path manager (which opens two subflows), and default path manager. I am using CUBIC as congestion control algorithm.
Uname -a:
Linux mptcp-box 4.14.24.mptcp #9 SMP Fri Mar 9 19:13:05 UTC 2018 x86_64 GNU/Linux
The kernel log with mptcp_debug enabled shows the connection establishment and nothing more until I don't force close the connection by interrupting the client/server (public IPs mangled on purpose; the following log is taken when using default path manager):
[13188.687390] mptcp_alloc_mpcb: created mpcb with token 0x4bf4afb9
[13188.687448] mptcp_add_sock: token 0x4bf4afb9 pi 1, src_addr:1.2.3.4:53676 dst_addr:3.4.5.6:1234, cnt_subflows now 1
On force close:
[13529.223071] mptcp_close: Close of meta_sk with tok 0x4bf4afb9
My sysctl configuration:
net.ipv4.tcp_rmem=4096 87380 8388608
net.ipv4.tcp_wmem=4096 16384 6291456
net.mptcp.mptcp_path_manager=default
net.ipv4.tcp_congestion_control=cubic
I am attaching a simple C program which reproduces the kind of traffic causing the error. It sends 2GB of random data in both direction over the same connection.
To compile:
gcc -Wall -O2 -pthread -o duplextcp duplextcp.c
Usage server:
./duplextcp -s localIP localPort
Usage client:
./duplextcp destIP destPort
Any help would be much appreciated. Thank you very much!
Hi @fciaccia,
Thank you for this detailed bug report! I will try to find time to reproduce it but it doesn't look very simple to fix as there are not so much info from the kernel :)
Matt
Writing down some notes on this issue here as I am looking into it:
There is packet-loss happening and from that moment on the connection is going to stall. What happens is that the server is retransmitting a lot of out-of-order data and there is ultimately only one segment missing. However, this last one is never getting acknowledged, although it is being retransmitted.
This means that in the TCP-input path somewhere we are dropping it. nstat counters should indicate where, but I can't find which one that is. tcp_try_rmem_schedule
should not be the cause as otherwise the PRUNED counters would have increased. It could be added to the backlog-queue and the backlog-queue might have never been scheduled. That is an option as the client is working on two threads for sending/receiving.
Besides that, maybe some other drop is taking place.
I think the main problem here is, that there is no (known? documented?) possiblity to debug whats happening and what the internal states of mptcp are.
I played two days with mptcp and saw only strange behavoir, looking the same as all the bugs here around, but no way to assign it to internal states or something like that.
Without this there is no debugging possible, nor bug reports which could help, don't you think so?