dc_tts-transfer-learning
dc_tts-transfer-learning copied to clipboard
Transfer learning exploration of dc_tts text-to-speech model
dc_tts-transfer-learning
This repo contains attempts to apply transfer learning to the dc_tts text-to-speech model decribed in the paper Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention. The code used is a modified version of Kyubyong's dc_tts code. The pretrained model was also provided in Kyubong's repo. It was pretrained on the LJ Speech Dataset. Scarlett Johansson's voice was trained during transfer learning
Transfer Learning is accomplished by selecting the model layers to train in hyperparameters.py
Task List:
- [x] add selectable list of layers for transfer learning
- [x] prelim model training
- [ ] add scoring history plots
- [ ] detailed exploration of which layers to train
- [ ] explore data augmentation methods
- [ ] explore post-processing
Prelim Model Training
- ~6 hrs of training on Tesla V100 GPU
- Layers trained:
- SSRN(C_13, C_14, C_15, C_16)
- Text2Mel/TextEnc(HC_11, HC_12, HC_13, HC_14, HC_15)
- Text2Mel/AudioEnc(HC_9, HC_10, HC_11, HC_12, HC_13)
- Text2Mel/AudioDec(HC_7, C_8, C_9, C_10, C_11)
Transfer learning data source:
Scarlett Johansson's audio book
Model Generated Examples (parodies of famous quotes from A.I. in movies):
- Greetings Professor Falken Shall We Play A Game
- I'm Sorry Dave I'm Afraid I Can't Do That
- I Do Not Stand By In The Presence Of Evil
- The Most Versatile Substance On The Planet And They Used It To Make A Frisbee
- The First Ten Million Years Were The Worst And The Second Ten Million Years They Were The Worst Too
- I Honestly Think You Ought To Sit Down Calmly Take A Stress Pill And Think Things Over
- A Strange Game The Only Winning Move Is Not To Play
- The Game Has Changed Son Of Flynn
- Greetings Programs
- You Shouldn't Have Come Back Flynn
references: