OpenUtau icon indicating copy to clipboard operation
OpenUtau copied to clipboard

Add DC offset removal and configurable fade for phrase boundaries

Open leostudiooo opened this issue 3 months ago • 1 comments

Adds DC offset removal and configurable fade-in/fade-out to phrase boundaries to eliminate clicking noise.

image

Fixes #1762

Investigation Update

After further investigation on #1762, the root cause of clicking noise is DC offset at phrase boundaries. Note that this issue only affects some voicebanks - not all models produce DC offset.

Root Cause Analysis

Each phrase may have a small DC offset (~-56dB). At phrase boundaries, sudden changes in DC offset create step functions that appear as vertical lines across all frequencies in the spectrogram.

Solution

Implemented edge-based DC offset removal and optional fading:

  1. Calculate DC offset using first/last 100 samples of each phrase (boundary regions)
  2. Subtract this offset from the entire phrase
  3. Apply optional short fade (0-50ms) for additional smoothing

Architecture

General-purpose implementation (per @yqzhishen's suggestion):

  • Features available for all synthesis engines (Classic, Worldline, Enunu, Vogen, DiffSinger, Voicevox)
  • Implemented in Renderers.ApplyPostProcessing() - centralized post-processing utility
  • Applied in RenderEngine.RenderRequests() - single choke point for all rendered phrases
  • Settings moved from DiffSinger preferences to general Rendering preferences
  • Zero changes needed to individual renderer implementations

Results

Before (raw output): raw

After DC offset removal: dc_offset_removal

After DC offset removal + 10ms fade: dc_offset_removal_and_10ms_fading

Features

  • DC offset removal: Edge-based estimation and removal (default: off)
  • Phrase fading: Configurable fade duration 0-50ms (default: off, 10ms when enabled)
  • Fade curves: Linear, exponential, sine, equal-power, Hann (default: Hann)
  • Both features work on cached renders (applied as post-processing)
  • Available in general Rendering preferences (affects all engines)

Code Quality Improvements

  • Fixed double-counting bug in DC offset calculation for short phrases
  • Added input validation for fade parameters
  • Normalized exponential curve to reach exactly 1.0
  • Extracted magic numbers as named constants

Built and tested on macOS 26.0.1 arm64.


Special thanks to @yqzhishen for suggesting to make this a general-purpose feature instead of DiffSinger-only! 🙏

leostudiooo avatar Oct 10 '25 16:10 leostudiooo

  1. This sounds like a vocoder issue. The best place to solve it is the vocoder. I don't think the community vocoders produce DC offset. Is it a fine-tuned vocoder?
  2. The solution in this PR is kind of weird. Should at lease use a high pass filter to process the waveform.
  3. Since nobody else would need this, should probably use an option in ds vocoder config to turn it on.

stakira avatar Nov 29 '25 05:11 stakira