Cookie

Results 20 comments of Cookie

pretrained models?

Do you have the code you used to feed the tacotron outputs into melgan uploaded somewhere? That's definitely bugged out.

``` mel_sent = tacotron_out(model, sentence, CONFIG, use_cuda, ap, use_gl=use_gl, figures=True) ``` Where does this line come from? This repo is designed to inferface with [NVIDIA/Tacotron](https://github.com/NVIDIA/tacotron2). Nvidia uses their own Spectrogram...

@tsungruihon You should be able to scale the output and get an audible result. I don't know what range Mozilla TTS has, but try to transform the Mozilla output to...

The current implementation is definitely not working how we'd expect it to (check "Comparison" section at the bottom of this comment). I've uploaded [some results](https://drive.google.com/drive/folders/10pfmkzOGfdfN9gbuzAtQ7feUie26JR1q?usp=share_link) if anyone wants to experiment...

@nagolinc Here's a more direct comparison. (3rd column is stable-diffusion+paint_with_words, 4th column is stable-diffusion) This is best-of 4 attempts for each image. --- ![image](https://user-images.githubusercontent.com/42448678/200351565-6acc982a-ca43-4bf8-8723-eb3703f94bce.png) --- I can definitely see merit...

@cloneofsimo Feel free to use anything I posted on this thread.

@cloneofsimo These were generated with `2.0*log(1+sigma)*std(QK)`. I've only tested 14 combinations so be aware this configuration is probably still not the best that can be found.

@jik876 Did you notice any overfitting with your largest/best models? Any areas for possible improvement to audio quality i.e sampling rate, speaker embeddings, noise embeddings? The speakers I'm targeting are...