Rishikesh (ऋषिकेश)

Results 160 comments of Rishikesh (ऋषिकेश)
trafficstars

@go2sea Use this repo: https://github.com/rosinality/vision-transformers-pytorch pre-processing and training script for CvT model. I will try when I have vacant gpus and some extra time.

I trained this model on custom dataset so don't have pretrain on any other open source dataset

![Tensorboard](https://github.com/rishikksh20/FastSpeech2/blob/master/img/tensorboard1_1.png?raw=True)

![Tensorboard](https://github.com/rishikksh20/FastSpeech2/blob/master/img/tensorboard2_1.png?raw=True)

@dathudeptrai Currently I am using raw pitch and energy with MSE that's why error looks so high but if required I will standardize or normalize in future. Pitch and energy...

@dathudeptrai I am using l1 loss masked actually l1_loss is combined loss of before and after Postnet l1 loss that's why it's very, although before and after l1 losses are...

@dathudeptrai I don't think so https://github.com/rishikksh20/FastSpeech2/blob/5bc2b402a237ed57e236c3a75d19964cf0f71987/utils/stft.py#L161 they are using spectral normalization: https://github.com/rishikksh20/FastSpeech2/blob/5bc2b402a237ed57e236c3a75d19964cf0f71987/utils/stft.py#L153

@Liujingxiu23 https://github.com/CyberZHG/torch-layer-normalization/blob/master/torch_layer_normalization/layer_normalization.py this works good. Yes speaker embedding generated by speaker encoder using in speaker verification works.

I have never tested for multi-gpu

@seungwonpark yeah sure, I will train MelGAN on GTA. I am also planning to train it in multiple voices as I have a huge repo of large (> 40 hrs)...