Retrieval-based-Voice-Conversion-WebUI
Retrieval-based-Voice-Conversion-WebUI copied to clipboard
Minor output alignment issue
When there are more than one segments in vc.pipeline, every new segment makes subsequent audio output out of place by one frame (i.e. length of self.window = 0.01s). The final audio is thereby also slightly shorter. This effect is barely perceptible, but can be verified using audio editing software.
After experimenting with several attempted fixes, this effect can be drastically reduced after changing line 388, 405, 423 and 440 of infer/modules/vc/pipeline.py from "self.t_pad_tgt : -self.t_pad_tgt" to "(self.t_pad_tgt - self.window) : -(self.t_pad_tgt - self.window)."
I was able to cut the input vs output length difference of a 4-minute audio from 0.04 seconds down to 0.002 seconds.