Save cluster file in bio to avoid the stuck io latency
When the cluster changes, we need to persist the cluster configuration, if I/O is delayed or blocked, possibly by disk contention, this may result in large latencies on the main thread.
We should avoid synchronous I/O from the main thread. So in this commit, we will try to bio to save the config file. We add a bio job and send a sds version of the config file, which does the synchronous save, so there is some eventually consistent version consistently stored on disk.
This may break our previous assumption that nodes.conf is in sync and has the strong consistency. For shutdown and cluster saveconfig, we will wait for the bio job to get drained and trigger a new save in a sync way.
Closes #2424.
Codecov Report
:x: Patch coverage is 96.00000% with 2 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 72.49%. Comparing base (04d0bba) to head (8dec4e0).
:warning: Report is 1 commits behind head on unstable.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/bio.c | 85.71% | 1 Missing :warning: |
| src/cluster_legacy.c | 97.67% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## unstable #2555 +/- ##
============================================
+ Coverage 72.41% 72.49% +0.07%
============================================
Files 129 129
Lines 70528 70548 +20
============================================
+ Hits 51076 51146 +70
+ Misses 19452 19402 -50
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/bio.c | 85.10% <85.71%> (-0.19%) |
:arrow_down: |
| src/cluster_legacy.c | 87.60% <97.67%> (+0.10%) |
:arrow_up: |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
Core team meeting:
- Added some reviewers to make sure this makes progress, since this seems to have been forgotten. Some concerns raised about double voting.
Can we target this fix for upcoming patch release?.
one concern about shutdown, I think we should call bioDrainWorker in finishShutdown to wait the cluster config write done.
@cherukum-Amazon sorry for the dealy, i somehow lost the context a while ago, i will try to refresh it this week. Let's start working on #1032 first and try to push it forward.
one concern about shutdown, I think we should call bioDrainWorker in finishShutdown to wait the cluster config write done.
yes, we do call bioDrainWorker in finishShutdown.