mptcp ACK is not sent on second subflow

Hi. I'm running iperf3 session with 1 parallel stream via mptcp that have 2 paths with different RTT. iperf3 server's side have "olia" as congestion control, default scheduler and fullmesh PM. Traffic distribution is about 2 MB/s on first path and 100 MB/s on second.

Sometimes second subflow hangs because of ACK (from client) with very small window size value. Client don't send another ACK and server can't send suitable packet. All traffic goes through first (slow) interface and can't back to normal functionality.

I've done some fix for ACK sending that works for me.

diff --git a/net/mptcp/mptcp_ctrl.c b/net/mptcp/mptcp_ctrl.c
index 26989fc2f..426f430fa 100644
--- a/net/mptcp/mptcp_ctrl.c
+++ b/net/mptcp/mptcp_ctrl.c
@@ -1538,17 +1538,8 @@ void mptcp_cleanup_rbuf(struct sock *meta_sk, int copied)
 {
 	struct tcp_sock *meta_tp = tcp_sk(meta_sk);
 	struct sock *sk;
-	bool recheck_rcv_window = false;
 	__u32 rcv_window_now = 0;
 
-	if (copied > 0 && !(meta_sk->sk_shutdown & RCV_SHUTDOWN)) {
-		rcv_window_now = tcp_receive_window(meta_tp);
-
-		/* Optimize, __mptcp_select_window() is not cheap. */
-		if (2 * rcv_window_now <= meta_tp->window_clamp)
-			recheck_rcv_window = true;
-	}
-
 	mptcp_for_each_sk(meta_tp->mpcb, sk) {
 		struct tcp_sock *tp = tcp_sk(sk);
 		const struct inet_connection_sock *icsk = inet_csk(sk);
@@ -1579,8 +1570,15 @@ void mptcp_cleanup_rbuf(struct sock *meta_sk, int copied)
 		}
 
 second_part:
+
+		rcv_window_now = 0;
+
+		if (copied > 0 && !(sk->sk_shutdown & RCV_SHUTDOWN)) {
+			rcv_window_now = tcp_receive_window(tp);
+		}
+
 		/* This here is the second part of tcp_cleanup_rbuf */
-		if (recheck_rcv_window) {
+		if (2 * rcv_window_now <= tp->window_clamp) {
 			__u32 new_window = tp->ops->__select_window(sk);
 
 			/* Send ACK now, if this read freed lots of space

Does itp->ops->__select_window(sk) have an impact on performance in this case?

Mar 29 '19 14:03 okkarpov

Hello!

Thanks for this report! I think your patch makes sense.

Just one thing: I believe you should rather check against meta_tp->window_clamp than the subflow's window-clamp. Because the subflow's window-clamp is not getting updated as only the meta's clamp matters.

Do you want to submit a formal patch to mptcp-dev?

Mar 30 '19 23:03 cpaasch

Thank you for reply! I will test with meta_tp->window_clamp and send patch to mptcp-dev.

Mar 31 '19 14:03 okkarpov

I've tested with meta_tp->window_clamp and problem still exist.

Is it possible always send ACK if rcv_window_now is small? For example:

if (2 * rcv_window_now <= meta_tp->window_clamp || rcv_window_now < tcp_current_mss(sk)) {

Does it make sense? tcp_current_mss probably returns tx value, but I have no idea what to use for rx.

This kind of optimization is pretty weird, because window_clamp is scaled value as I understand.

Apr 01 '19 13:04 okkarpov

The question rather is: Why would meta_tp->window_clamp be so small that 2 * rcv_window_now is bigger than it ?

What values of window_clamp are you seeing?

Apr 01 '19 16:04 cpaasch

window_clamp is about 600-1100. I'm using VDSL as a primary interface and LTE as a second. And primary interface have 10 times bigger RTT.

Apr 01 '19 21:04 okkarpov

window-clamp on meta-tp is 600 to 1100 ??? Wow... That's weird... Which branch exactly are you running on?

Apr 01 '19 22:04 cpaasch

0.93

Apr 02 '19 08:04 okkarpov

Do you have a pcap of this test together with the logs that you added to display the window_clamp ?

Apr 02 '19 16:04 cpaasch

Here is pcap: https://drive.google.com/open?id=15Vrgwr_A2id8_2IxxaPaFuLHV19s5ghs

Log that prints window_clamp and other data log.txt

I've ran two iperf in parallel (it is easier to reproduce with two): iperf3 -c 192.168.100.157 -t 300 -p 5202 -P 1 -R iperf3 -c 192.168.100.157 -t 300 -p 5203 -P 1 -R

192.168.150.20 - PTM 192.168.4.20 - LTE 192.168.100.157 - TARGET 192.168.101.157 - TARGET

hang is on 192.168.4.20:60881 - 192.168.101.157:5203. packet No 99499

Thank you for help.

Apr 03 '19 16:04 okkarpov

Hi @cpaasch, I'm interested in this issue, have you had time to look at the PCAP?

Apr 16 '19 10:04 lenormf

Coming back on this - I have been doing some code-review and some tests. I don't see how window_clamp could ever become that low. window_clamp is actually strictly increasing (except if the app is forcing it to a low value). So, I don't think there is a way to get into this state. Did you made any other changes to the kernel that you were running?

Let me know if you still see this on newer kernels.

Sep 03 '19 04:09 cpaasch

mptcp mptcp copied to clipboard

ACK is not sent on second subflow

mptcp
mptcp copied to clipboard