Add DC offset removal and configurable fade for phrase boundaries
Adds DC offset removal and configurable fade-in/fade-out to phrase boundaries to eliminate clicking noise.
Fixes #1762
Investigation Update
After further investigation on #1762, the root cause of clicking noise is DC offset at phrase boundaries. Note that this issue only affects some voicebanks - not all models produce DC offset.
Root Cause Analysis
Each phrase may have a small DC offset (~-56dB). At phrase boundaries, sudden changes in DC offset create step functions that appear as vertical lines across all frequencies in the spectrogram.
Solution
Implemented edge-based DC offset removal and optional fading:
- Calculate DC offset using first/last 100 samples of each phrase (boundary regions)
- Subtract this offset from the entire phrase
- Apply optional short fade (0-50ms) for additional smoothing
Architecture
General-purpose implementation (per @yqzhishen's suggestion):
- Features available for all synthesis engines (Classic, Worldline, Enunu, Vogen, DiffSinger, Voicevox)
- Implemented in
Renderers.ApplyPostProcessing()- centralized post-processing utility - Applied in
RenderEngine.RenderRequests()- single choke point for all rendered phrases - Settings moved from DiffSinger preferences to general Rendering preferences
- Zero changes needed to individual renderer implementations
Results
Before (raw output):
After DC offset removal:
After DC offset removal + 10ms fade:
Features
- DC offset removal: Edge-based estimation and removal (default: off)
- Phrase fading: Configurable fade duration 0-50ms (default: off, 10ms when enabled)
- Fade curves: Linear, exponential, sine, equal-power, Hann (default: Hann)
- Both features work on cached renders (applied as post-processing)
- Available in general Rendering preferences (affects all engines)
Code Quality Improvements
- Fixed double-counting bug in DC offset calculation for short phrases
- Added input validation for fade parameters
- Normalized exponential curve to reach exactly 1.0
- Extracted magic numbers as named constants
Built and tested on macOS 26.0.1 arm64.
Special thanks to @yqzhishen for suggesting to make this a general-purpose feature instead of DiffSinger-only! 🙏
- This sounds like a vocoder issue. The best place to solve it is the vocoder. I don't think the community vocoders produce DC offset. Is it a fine-tuned vocoder?
- The solution in this PR is kind of weird. Should at lease use a high pass filter to process the waveform.
- Since nobody else would need this, should probably use an option in ds vocoder config to turn it on.