Retrieval-based-Voice-Conversion-WebUI icon indicating copy to clipboard operation
Retrieval-based-Voice-Conversion-WebUI copied to clipboard

Minor output alignment issue

Open matthew99a opened this issue 1 year ago • 0 comments

When there are more than one segments in vc.pipeline, every new segment makes subsequent audio output out of place by one frame (i.e. length of self.window = 0.01s). The final audio is thereby also slightly shorter. This effect is barely perceptible, but can be verified using audio editing software.

After experimenting with several attempted fixes, this effect can be drastically reduced after changing line 388, 405, 423 and 440 of infer/modules/vc/pipeline.py from "self.t_pad_tgt : -self.t_pad_tgt" to "(self.t_pad_tgt - self.window) : -(self.t_pad_tgt - self.window)."

I was able to cut the input vs output length difference of a 4-minute audio from 0.04 seconds down to 0.002 seconds.

matthew99a avatar Sep 20 '24 08:09 matthew99a