Real-Time-Voice-Cloning
Real-Time-Voice-Cloning copied to clipboard
synthesizer params varries for same input audio and text??
If run a the demo.cli.py for same audio and text multiple time i can see variation in synthesizer EX-Synthesizing the waveform: {| ████████████████ 57000/57600 | Batch Size: 6 | Gen Rate: 5.1kHz | }float64
Synthesizing the waveform: {| ████████████████ 47500/48000 | Batch Size: 5 | Gen Rate: 4.3kHz | }float64
Synthesizing the waveform: {| ████████████████ 47500/48000 | Batch Size: 5 | Gen Rate: 4.2kHz | }float64
Synthesizing the waveform: {| ████████████████ 57000/57600 | Batch Size: 6 | Gen Rate: 5.2kHz | }float64
Same audio and text but output is ranging? do you know what can be the reason?
The output varies because dropout is used in inference, in the encoder and decoder prenets. Dropout causes some tensor elements to be zeroed out at random. Its purpose is to help the model generalize in training, but as the Tacotron authors explain, it is preserved for inference to introduce some variation in the output. For completely deterministic output, use the --seed
option (it causes the random number generator to be initialized to the same state when generating each time).
Can you tell me how we can use this --seed option ??
It's a command line argument for demo_cli.py and demo_toolbox.py. You also need to specify the value of the seed. For example:
python demo_cli.py --seed 0
python demo_toolbox.py --seed 0
Ok thank you so much
After providing the seed option still Gen rate is varrying for same audio and text - Synthesizing the waveform: {| ████████████████ 171000/172800 | Batch Size: 18 | Gen Rate: 3.4kHz | }float64 Synthesizing the waveform: {| ████████████████ 171000/172800 | Batch Size: 18 | Gen Rate: 4.6kHz | }float64 Synthesizing the waveform: {| ████████████████ 171000/172800 | Batch Size: 18 | Gen Rate: 4.2kHz | }float64
Notice how the synthesized output is now identical in each case, with a length of 172800 timesteps. Some variation in inference speed is normal, and does not affect output.
While providing seed option how do we know which value gives most correct output?
You don't currently, its like minecraft seeds. Just gotta try ur luck.