overte icon indicating copy to clipboard operation
overte copied to clipboard

Asset server is extremely slow on servers with high ping.

Open ksuprynowicz opened this issue 2 years ago • 3 comments

It doesn't seem to happen on servers with low ping, it's reasonably fast there. Ping that causes issues is about 340.

ksuprynowicz avatar Aug 16 '23 15:08 ksuprynowicz

340 ms ping sounds to me like the connection is just really bad and probably saturated. In todays day and age, even if someone is on the other side of the planet, one can generally expect 130 ms.

JulianGro avatar Aug 16 '23 16:08 JulianGro

I found something really interesting in asset server code: static const uint8_t MIN_CORES_FOR_MULTICORE = 4; Maybe that is throttling it on our servers?

ksuprynowicz avatar Nov 14 '23 21:11 ksuprynowicz

TLDR: I believe the congestion control constantly gets reset and never reaches more than 26 packets in transit. Which makes the bandwidth reverse proportional to the latency.

Using an Indonesian VPN, I can get the asset ping to ~430ms. Looking at the asset server statistics:

  • It sets a Congestion Window of 2 Packets. My understanding is that that means it will only allow having 2 packets in transit until one of them gets acknowledged and the next one is sent. This would mean that if it takes 430ms to acknowledge a packet, we can send 4,6 packets per second, which would be less than 10 kilobytes per second.
  • It reports sending 5–6 packets per second, which roughly matches my assumptions from the congestion window above.
  • It reports retransmitting 220–260 packets per second and receiving and processing their acknowledgements.

These observations make me believe that:

  1. We don't wait long enough before retransmitting on high latency connections. Based on the amount of retransmitted packets, I would guess that we only wait ~20–21ms for an acknowledgement before retransmitting.
  2. Our congestion control is broken on high latency connections.
  3. Our retransmissions ignore the congestion window. If our congestion window allows us to send ~4,6 packets per second, then why are we retransmitting >220 times per second? If the congestion control is right, then we are completely overwhelming the connection with our retransmissions.

Looking at the code:

  • The minimum congestion window size is 2. (https://github.com/overte-org/overte/blob/4e6575a9170438e618c680b52c6830b41d815ff9/libraries/networking/src/udt/TCPVegasCC.cpp#L234)
  • The default congestion window size is 16. (https://github.com/overte-org/overte/blob/4e6575a9170438e618c680b52c6830b41d815ff9/libraries/networking/src/udt/CongestionControl.h#L59)

One thing I found weird in the code is the Vegas congestion control resetting the counter of measured round trip times. (https://github.com/overte-org/overte/blob/4e6575a9170438e618c680b52c6830b41d815ff9/libraries/networking/src/udt/TCPVegasCC.cpp#L249) If I understand this right, then the congestion control constantly resets, which seems true based on the statistics.

Unfortunately, I cannot wrap my head around this code with my current very limited understanding of networking. I tried reading stuff for two hours, but I hit too much stuff I don't know. For example, I think we are supposed to send multiple acknowledgements to signify that we received a packet we cannot acknowledge yet, because we haven't received the previous packet yet. But this would cause a flood of acknowledgements since we also retransmit tons of packages. I mean, who know, maybe this is exactly the problem. Maybe if a packet arrives out of order, it ends up retransmitting every previous packet, which would also throw off the congestion control. Maybe packets just don't arrive out of order very often in low latency situations, while the do arrive out of order in high latency situations.

JulianGro avatar Apr 13 '25 11:04 JulianGro