Adaptive read pool threads
Feature Request
Read pool thread is configured to be very large number (5x of CPU) because our average EBS latency are in the range of 1ms. However, this large read pool sometimes leads to CPU bottlenecks in TIKV when there are hot regions. As a result, the blast radius increases, and the cluster experiences extended recovery times since other threads, such as raft and resolved-ts threads, become slower. Given the high EBS latency and the use of RocksDB in synchronous mode, we recommend making the system more adaptive by implementing the following changes:
- Limit the CPU utilization of the unified read pool thread through a configuration parameter.
- Currently, the unified read pool scales up or down based solely on the CPU utilization of each thread. It should also consider the wait time of tasks to determine when to adjust its size.
This solution is also useful to enable resource control group(RCG). RCG fair scheduling in TiKV doesn't work, impacting all the tenants if TiKV CPU becomes a bottleneck
Is your feature request related to a problem? Please describe:
Describe the feature you'd like:
Describe alternatives you've considered:
Teachability, Documentation, Adoption, Migration Strategy:
@mittalrishabh have you considered async io for the coprocessor read?
are you talking about rocksDB async IO.