mptcp
mptcp copied to clipboard
ACK is not sent on second subflow
Hi. I'm running iperf3 session with 1 parallel stream via mptcp that have 2 paths with different RTT. iperf3 server's side have "olia" as congestion control, default scheduler and fullmesh PM. Traffic distribution is about 2 MB/s on first path and 100 MB/s on second.
Sometimes second subflow hangs because of ACK (from client) with very small window size value. Client don't send another ACK and server can't send suitable packet. All traffic goes through first (slow) interface and can't back to normal functionality.
I've done some fix for ACK sending that works for me.
diff --git a/net/mptcp/mptcp_ctrl.c b/net/mptcp/mptcp_ctrl.c
index 26989fc2f..426f430fa 100644
--- a/net/mptcp/mptcp_ctrl.c
+++ b/net/mptcp/mptcp_ctrl.c
@@ -1538,17 +1538,8 @@ void mptcp_cleanup_rbuf(struct sock *meta_sk, int copied)
{
struct tcp_sock *meta_tp = tcp_sk(meta_sk);
struct sock *sk;
- bool recheck_rcv_window = false;
__u32 rcv_window_now = 0;
- if (copied > 0 && !(meta_sk->sk_shutdown & RCV_SHUTDOWN)) {
- rcv_window_now = tcp_receive_window(meta_tp);
-
- /* Optimize, __mptcp_select_window() is not cheap. */
- if (2 * rcv_window_now <= meta_tp->window_clamp)
- recheck_rcv_window = true;
- }
-
mptcp_for_each_sk(meta_tp->mpcb, sk) {
struct tcp_sock *tp = tcp_sk(sk);
const struct inet_connection_sock *icsk = inet_csk(sk);
@@ -1579,8 +1570,15 @@ void mptcp_cleanup_rbuf(struct sock *meta_sk, int copied)
}
second_part:
+
+ rcv_window_now = 0;
+
+ if (copied > 0 && !(sk->sk_shutdown & RCV_SHUTDOWN)) {
+ rcv_window_now = tcp_receive_window(tp);
+ }
+
/* This here is the second part of tcp_cleanup_rbuf */
- if (recheck_rcv_window) {
+ if (2 * rcv_window_now <= tp->window_clamp) {
__u32 new_window = tp->ops->__select_window(sk);
/* Send ACK now, if this read freed lots of space
Does itp->ops->__select_window(sk) have an impact on performance in this case?
Hello!
Thanks for this report! I think your patch makes sense.
Just one thing: I believe you should rather check against meta_tp->window_clamp
than the subflow's window-clamp. Because the subflow's window-clamp is not getting updated as only the meta's clamp matters.
Do you want to submit a formal patch to mptcp-dev?
Thank you for reply! I will test with meta_tp->window_clamp and send patch to mptcp-dev.
I've tested with meta_tp->window_clamp and problem still exist.
Is it possible always send ACK if rcv_window_now is small? For example:
if (2 * rcv_window_now <= meta_tp->window_clamp || rcv_window_now < tcp_current_mss(sk)) {
Does it make sense? tcp_current_mss probably returns tx value, but I have no idea what to use for rx.
This kind of optimization is pretty weird, because window_clamp is scaled value as I understand.
The question rather is: Why would meta_tp->window_clamp
be so small that 2 * rcv_window_now
is bigger than it ?
What values of window_clamp
are you seeing?
window_clamp is about 600-1100. I'm using VDSL as a primary interface and LTE as a second. And primary interface have 10 times bigger RTT.
window-clamp on meta-tp is 600 to 1100 ??? Wow... That's weird... Which branch exactly are you running on?
0.93
Do you have a pcap of this test together with the logs that you added to display the window_clamp ?
Here is pcap: https://drive.google.com/open?id=15Vrgwr_A2id8_2IxxaPaFuLHV19s5ghs
Log that prints window_clamp and other data log.txt
I've ran two iperf in parallel (it is easier to reproduce with two): iperf3 -c 192.168.100.157 -t 300 -p 5202 -P 1 -R iperf3 -c 192.168.100.157 -t 300 -p 5203 -P 1 -R
192.168.150.20 - PTM 192.168.4.20 - LTE 192.168.100.157 - TARGET 192.168.101.157 - TARGET
hang is on 192.168.4.20:60881 - 192.168.101.157:5203. packet No 99499
Thank you for help.
Hi @cpaasch, I'm interested in this issue, have you had time to look at the PCAP?
Coming back on this - I have been doing some code-review and some tests. I don't see how window_clamp could ever become that low. window_clamp
is actually strictly increasing (except if the app is forcing it to a low value). So, I don't think there is a way to get into this state. Did you made any other changes to the kernel that you were running?
Let me know if you still see this on newer kernels.