kotlinx-io
kotlinx-io copied to clipboard
Better segment pools
Currently, segment pools exist only on JVM (on other platforms, implementations are effectively no-op) and behave more like caches than pools.
There are a few directions in which we may/should develop segment pools:
- [ ] support pools configuration (both in terms of the pool capacity and the size of allocated segments)
- [x] make sure every segment is returned to a pool once there are no users remaining (https://github.com/Kotlin/kotlinx-io/pull/347)
- [ ] support unlinking a segment from a pool (for scenarios when we have a byte-buffer backed segment and now sending it via Netty -> segment is no longer in use once we "consumed" it by wrapping into a ByteBuf, but it could not be released as a Netty owns it now)
- [x] support adding already allocated data into a buffer (related to https://github.com/Kotlin/kotlinx-io/issues/166)
- [ ] support pools on other platforms
- [ ] support pool-level isolation (for instance, if there are multiple threads make sure that each of them uses a separate pool, so that data used in one thread would never leak into another thread)
- [ ] support leak tracing (https://github.com/Kotlin/kotlinx-io/issues/144)
This is an epic describing what could be done and tracking progress rather than an instruction to what should be implemented.
The first two points (not returning segments back to the pool and small pool size) are blockers for integration with Ktor (https://youtrack.jetbrains.com/issue/KTOR-6030, https://github.com/ktorio/ktor/pull/4032), so they need to be fixed.
It would be great to have a system property to adjust the pool size as well
As @bjhham pointed out, Buffer::close should release segments (currently, it's no-op).
However, it could be problematic, as a typical Buffer use scenario is "allocate, use, and forget".
Maybe we can try to fix this by manual allocation buffer for writing using a different constructor method, so the buffers for channels and primitives in Ktor will be tracked
make sure every segment is returned to a pool once there are no users remaining
One of the scenarios when a segment is not returned to the pool is when is was shared between multiple buffers. Currently, it's tracked by a flag, so there's no way to check if a segment can be safely returned back to the pool.
Replacing a flag with a ref-counter solves the issue and the performance impact seems neglectable.
The issue with changing the default pool size is how this property is used: currently, the pool consists of multiple chunks (the number depends on CPU count), and the property applies to each chunk individually. In an unlucky scenario, we may end up with all buckets being filled up with pooled segments, but only one of them will be used.
This place is tricky in terms of performance, I need to research a bit more what could be done.
You may try using the same strategy as for connection pool with lazy allocation and releasing allocated buffers through time
Currently, I'm gravitating towards a solution with a two-tier segment pool:
- the first tier remains the same as it is now: a relatively small pool sharded by the thread ID;
- the second tier will be disabled by default, but once enabled, all failed attempts to take a segment from the first tier will be continue by taking a segment from the second tier before allocating a new segment; the same with segment recycling - if the first tier is full, an attempt to return it to the second tier will be made;
- unlike the first tier, the second will be shared across all threads.
Some of the issues from the summary are addressed here: #352