Real-Time-Voice-Cloning icon indicating copy to clipboard operation
Real-Time-Voice-Cloning copied to clipboard

synthesizer params varries for same input audio and text??

Open ayush431 opened this issue 2 years ago • 8 comments

If run a the demo.cli.py for same audio and text multiple time i can see variation in synthesizer EX-Synthesizing the waveform: {| ████████████████ 57000/57600 | Batch Size: 6 | Gen Rate: 5.1kHz | }float64

Synthesizing the waveform: {| ████████████████ 47500/48000 | Batch Size: 5 | Gen Rate: 4.3kHz | }float64

Synthesizing the waveform: {| ████████████████ 47500/48000 | Batch Size: 5 | Gen Rate: 4.2kHz | }float64

Synthesizing the waveform: {| ████████████████ 57000/57600 | Batch Size: 6 | Gen Rate: 5.2kHz | }float64

Same audio and text but output is ranging? do you know what can be the reason?

ayush431 avatar May 05 '22 12:05 ayush431

The output varies because dropout is used in inference, in the encoder and decoder prenets. Dropout causes some tensor elements to be zeroed out at random. Its purpose is to help the model generalize in training, but as the Tacotron authors explain, it is preserved for inference to introduce some variation in the output. For completely deterministic output, use the --seed option (it causes the random number generator to be initialized to the same state when generating each time).

raccoonML avatar May 06 '22 05:05 raccoonML

Can you tell me how we can use this --seed option ??

ayush431 avatar May 06 '22 06:05 ayush431

It's a command line argument for demo_cli.py and demo_toolbox.py. You also need to specify the value of the seed. For example:

python demo_cli.py --seed 0
python demo_toolbox.py --seed 0

raccoonML avatar May 06 '22 06:05 raccoonML

Ok thank you so much

ayush431 avatar May 06 '22 06:05 ayush431

After providing the seed option still Gen rate is varrying for same audio and text - Synthesizing the waveform: {| ████████████████ 171000/172800 | Batch Size: 18 | Gen Rate: 3.4kHz | }float64 Synthesizing the waveform: {| ████████████████ 171000/172800 | Batch Size: 18 | Gen Rate: 4.6kHz | }float64 Synthesizing the waveform: {| ████████████████ 171000/172800 | Batch Size: 18 | Gen Rate: 4.2kHz | }float64

ayush431 avatar May 06 '22 08:05 ayush431

Notice how the synthesized output is now identical in each case, with a length of 172800 timesteps. Some variation in inference speed is normal, and does not affect output.

raccoonML avatar May 06 '22 09:05 raccoonML

While providing seed option how do we know which value gives most correct output?

ayush431 avatar May 06 '22 09:05 ayush431

You don't currently, its like minecraft seeds. Just gotta try ur luck.

TrycsPublic avatar May 25 '22 20:05 TrycsPublic