lsquic client occasionally stops sending ACKs on larger downloads
Downloading larger files (100MB) with lsquic client occasionally fails because the client is not sending ACKs for too long.
Here is what we observe:
- large portion of the download goes smoothly
- sometimes there is number of incoming packets in row without any client ACKs (so far no issue but let's call this "smaller hole")
- but eventually this "hole" without ACKs becomes too large (>2000 packets, >6seconds) so the server closes the connection with "Network black hole detected"
- client responds to CC immediately with ACK containing the full range of packets
Notes:
- using client built from latest master, commit b0bd690
- server is Akamai quic server
- there is GSO on the server side (should be unrelated; not sure if lsquic uses recvmmsg but there are individual incoming packets seen on the network device so no GRO on receiver side)
- we have full tcpdump with the failed download
- client ran w/
-o delayed_acks=0on the command line
See screenshots for pieces of tcpdump on client side showing:
- start and end for random "smaller hole" in the middle of the download (2.6 seconds, 2442 packets)
- start and end for final "lethal hole" which ended w/ CC from the server (6.6 seconds, 2445 packets)
Full client stderr during failed download: client.stderr.log
You can try disabling delayed ACK feature with settings->es_delayed_acks = 0.
Also 2.4.1 is too old. maybe you should try the latest 4.0.x ?
Will try to disable delayed ACK, thank you.
Regarding the version the number: 2.4.1 is misleading; sorry. I was tricked by git describe giving me "v2.4.1-338-gb0bd690". We regularly build from master and the commit above is less than a month old.
I see that the client in the failed case above was already called w/ -o delayed_acks=0 on the command line.
So it is already disabled.
Hi @koujaz -- thank you for the bug report.
Can you recommend an Akamai endpoint to fetch some large files from?
@dtikhonov https://dlm.akamai.com/test/1GBfile.bin
I cannot reproduce this -- with or without the delayed_acks setting. The largest interval between sent ACKs I observe is 200 - 300 ms.
I can confirm that we haven't see the issue in the nightly testing during the last few month. Feel free to close this. We would reopen it in case the issue happens again.