Steve Korshakov comments

Results 169 comments of


                                            Steve Korshakov

the mfa info of libritts

Hey! What for? it is trivial to reproduce using scripts in this repo!

the mfa info of libritts

Try to install 2x version, this should work. I am in the process of moving training to librilight instead of mixture of different datasets

Model convergence and inference

In my experiments loss doesn't change at all and stuck ~`0.3`, but i can observe the quality change and it improves and improves the longer i train, the longest i...

Model convergence and inference

I have updated all code in eval notebook, also published how to use instructions

Model convergence and inference

Style tokens (which are in fact just normalised pitch) improved emotional prosody a lot. Some of my notebooks has an example of inference without style tokens (I am training 10%...

Model convergence and inference

It is normalized during training, this numbers are from voicebox paper, but i feel for my data they should be different, but i am not careful enough yet. `voice_x.pt` is...

Model convergence and inference

@zvorinji hey, i am not convinced that Whisper has anything useful, i tried in the past to use it's latent outputs to predict presence of the voice, but it turns...

Can you provide a zero-shot clone example?

There is a simple script that created built-in voices: https://github.com/ex3ndr/supervoice/blob/master/generate_voices.py You can try to use similar script to create a new voice

Difference between Phoneme and Text tokenizer

I just haven't found a good phonemizer compatible to specific IPA subset that main network is trained on. I tried to use espeak, but it's phonemes are different from montreal...

Difference between Phoneme and Text tokenizer

I did not, you think this would improve something? I never need to fit tokens to a specific timeframe in my setup