valkey icon indicating copy to clipboard operation
valkey copied to clipboard

Allow dynamic modification of io-threads num

Open ayush933 opened this issue 8 months ago • 8 comments

Item from #761

This PR has the following changes

  1. Bug fix where calling pthread_join() from main thread for an IO thread would hang indefinitely. This is because IOThreadMain() doesn't have a cancellation point.So pthread_cancel() from main thread is not honored. Can be reproed by calling shutdownIOThread() from the main thread for any active thread with empty job queue. Fixed by adding cancellation point in IOThreadMain().
  2. Makes io-threads config runtime modifiable.

ayush933 avatar Apr 30 '25 15:04 ayush933

UTs pending atm.

ayush933 avatar Apr 30 '25 15:04 ayush933

Codecov Report

Attention: Patch coverage is 96.29630% with 1 line in your changes missing coverage. Please review.

Project coverage is 71.37%. Comparing base (af25a52) to head (3f54996). Report is 11 commits behind head on unstable.

Files with missing lines Patch % Lines
src/io_threads.c 96.00% 1 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2033      +/-   ##
============================================
+ Coverage     71.02%   71.37%   +0.34%     
============================================
  Files           122      122              
  Lines         66171    66050     -121     
============================================
+ Hits          46999    47142     +143     
+ Misses        19172    18908     -264     
Files with missing lines Coverage Δ
src/config.c 78.52% <ø> (+0.12%) :arrow_up:
src/networking.c 87.44% <100.00%> (+0.26%) :arrow_up:
src/server.h 100.00% <ø> (ø)
src/io_threads.c 34.70% <96.00%> (+27.89%) :arrow_up:

... and 17 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Apr 30 '25 15:04 codecov[bot]

Hey @uriyage, just curious , can you please explain the safety concerns a bit more i.e. why we would want to drain all jobs and inactivate all threads instead of the min num required?

ayush933 avatar May 01 '25 08:05 ayush933

Hey @uriyage, just curious , can you please explain the safety concerns a bit more i.e. why we would want to drain all jobs and inactivate all threads instead of the min num required?

@ayush933 I don't recall the exact workflow, but I remember attempting something similar, and it affected the logic of dynamically adjusting the threads or the client cur_tid logic. I might be mistaken, but I think it will be cleaner and safer.

uriyage avatar May 01 '25 15:05 uriyage

Hey @uriyage @madolson , anything else needed to move this PR forward?

ayush933 avatar May 08 '25 09:05 ayush933

Just some nits, but still would like @uriyage to take sign off as well, he knows more about this code.

LGTM. I left one small comment about the log verbosity

uriyage avatar May 11 '25 11:05 uriyage

Before this feature, the only time we called shutdownIoThread() was on server shutdown. Considering this wasn't noticed until now, I don't think its too severe and needs a backport but lets wait for @uriyage's input on this.

ayush933 avatar May 17 '25 05:05 ayush933

Before this feature, the only time we called shutdownIoThread() was on server shutdown. Considering this wasn't noticed until now, I don't think its too severe and needs a backport but lets wait for @uriyage's input on this.

Currently, it is only used by doFastMemoryTest, which is called from printCrashReport after a crash occurs, in this case the threads are suspended by the ThreadsManager_runOnThreads so not an issue.

uriyage avatar May 19 '25 09:05 uriyage

New crash https://github.com/valkey-io/valkey/actions/runs/15312340723/job/43079580795?pr=2149#step:4:1444

madolson avatar May 28 '25 23:05 madolson