valkey
valkey copied to clipboard
Adjust io-threads number in runtime to fully utilize multi threads and make CPU more efficient.
Description
This patch try to fix the issue when io-threads number is configured, io-threads will be stopped according below logic, this will cause the io-threads useless and longer latency. According to existing logic, the io-threads will be activated by a hardcode threshold value, this will cause QPS regression by the different configured io-threads number even the same clients requests.
int stopThreadedIOIfNeeded(void) {
int pending = listLength(server.clients_pending_write);
/* Return ASAP if IO threads are disabled (single threaded mode). */
if (server.io_threads_num == 1) return 1;
if (pending < (server.io_threads_num*2)) {
if (server.io_threads_active) stopThreadedIO();
return 1;
} else {
return 0;
}
}
With this patch, it will dynamically adjust the /IO threads number in runtime according to real workloads to tradeoff between the latency and CPU efficiency. I use decay rate formula based on clients_pending_writes to make throughput stable.
Benchmark Result
Test Environment
- OS: CentOS Stream 8
- Kernel: 6.2.0
- Platform: Intel Xeon Platinum 8380
- Base commit: b2a397366b1704b03f5042395ff620cd2096f598
Test Steps
Start Valkey-Server
Start valkey-server taskset -c 0-3 ~/src/valkey-server /tmp/valkey_1.conf with below config.
port 9001
bind * -::*
daemonize yes
protected-mode no
save ""
io-threads 4
io-threads-do-reads yes
Test from throughput perspective
Start valkey-benchmark in localhost with below commands:
taskset -c 4-7 ./src/valkey-benchmark -p 9001 -t set,get -d 128 -r 5000000 -n 10000000 --threads 2 -c 10
to measure the SET GET performance gap correspondingly, results are listed as below:
we can find that the io-threads is not used due to clients_pending_write < (server.io_threads_num*2), the throughput is limited by only single thread even more CPU resources were allocated.
NOTE: QPS will use relative value instead of absolute value.
CPU utilized: perf stat -p `pidof valkey-server` sleep 5
| Base CPU utilized | Opt CPU utilized | QPS Gain(Opt/Base-1) | |
|---|---|---|---|
| SET | 1 | 2 | 23% |
| GET | 1 | 2 | 25% |
Test from CPU efficiency perspective
If we increase the requests number by below commands to make origin io-threads work, throughput increased as expected, but all CPU are fully utilized. With this patch, we can make CPU more efficiency by reducing the active io-threads number.
taskset -c 4-7 ./src/valkey-benchmark -p 9001 -t set,get -d 128 -r 5000000 -n 10000000 --threads 2 -c 15
| Base CPU utilized | Opt CPU utilized | QPS Gain(Opt/Base-1) | |
|---|---|---|---|
| SET | 3.999 | 2.908 | 8% |
| GET | 3.999 | 2.918 | 3% |
@valkey-io/core-team Could you help to review this patch?
@lipzhu I noticed there wasn't a test, we should at least add one for that.
@madolson I am thinking how to add the unit tests for io-threads, it's for performance purpose, I am uncertain how to simulate it in unit tests, can you provide some guidance?
Ping @madolson .
My understanding of this change is that it helps with the volatile workload case where request rates fluctuate quite a bit. This change doesn't allocate/activate more IO worker threads at runtime but it allows the IO worker threads to stay active a bit longer using the Exponential Moving Average (EMA). This trades off a bit of CPU efficiency for a more consistent latency through these workload spikes/troughs, potentially significant.
Overall I am onboard with this change. I think the only thing I am looking forward to is a test that proves we can still activate all the threads configured via io-threads and the IO threads can indeed quiesce.
I am thinking how to add the unit tests for io-threads, it's for performance purpose, I am uncertain how to simulate it in unit tests, can you provide some guidance?
I don't think we need a (unit) test for performance. We will use release candidates to get feedback. What we really need is a correctness test as I mentioned above.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 69.89%. Comparing base (
6cff0d6) to head (1eb7c54). Report is 1 commits behind head on unstable.
Additional details and impacted files
@@ Coverage Diff @@
## unstable #111 +/- ##
============================================
+ Coverage 68.91% 69.89% +0.98%
============================================
Files 109 109
Lines 61792 61789 -3
============================================
+ Hits 42581 43186 +605
+ Misses 19211 18603 -608
| Files | Coverage Δ | |
|---|---|---|
| src/networking.c | 91.61% <100.00%> (+6.71%) |
:arrow_up: |
| src/server.c | 88.61% <100.00%> (+<0.01%) |
:arrow_up: |
@PingXie I added integration test to validate the io_threads_active_num EMA logic working properly. Could you help to review again?
What is the next process, is it ready to merge? @PingXie