okio icon indicating copy to clipboard operation
okio copied to clipboard

Optimize `AsyncTimeout` operations.

Open yyaroshevich opened this issue 1 month ago • 0 comments

This task is an invitation to discussion initially started here.

As a software engineer I have an access to monitoring of serveral backend services. Those services heavily rely on OkHttp to perform API calls to other services. I've noticed that on those services operations on AsyncTimeout like scheduling can take significant amount of time - it is top 1 CPU intensive function. выява According to DataDog AsyncTimeout also sometimes become a bottleneck due to application wide lock held to perform long CPU intensive operation. выява выява

This is presumably due to implementation detail of AsyncTimeout which is implemented as sorted linked list. An insertion to such list is a linear scan which takes more and more time when service has many requests in parallel and probably many timeouts scheduled.

I'll try to propose several options to overcome this issue. They are all different in complexity of implementation, but worth discussion. As on option it is possibly to use ScheduledThreadPoolExecutor to delegate efficient timeout scheduling to JDK. Under the hood ScheduledThreadPoolExecutor implemented as binary heap which will provide O(log(n)) operations on list. The drawback is that each task scheduled to ScheduledThreadPoolExecutor is wrapped into internal object and stored into array - which means additional allocations. I personally don't think it will be a big problem, but it worth noting this drawback.

Alternatively it is possible to adjust current implementation of AsyncTimeout which seems to be intrusive singly linked list to be intrusive liked binary heap which will also provide O(log(n)) operations without any extra allocations. The drawback of this approach is that it is much more complex to implement (those manipulations over binary tree are tricky), but this approach to timeouts is known and used by one of the most widespread native library for IO.

As suggested by @swankjesse it also might worth just to change direction of lookup which might reduce amount of elements of list traversed. Presumably due to fact that most of timeouts will be the same. Here in my opinion it will need to be validated that hypothesis about same timeouts is actually true. There might be something like timeout groups coming from different requests with different timeouts. Also different timeout can come from other timers like timers assumed by H2 protocol.

yyaroshevich avatar Jun 04 '24 09:06 yyaroshevich