Neural Voice Cloning with a Few Samples

Open flrngel opened this issue 6 years ago • 0 comments

https://arxiv.org/abs/1802.06006 Paper from Baidu Research

Abstract

Paper will do

Speaker adaption
- fine-tuning a multi-speaker generative model
Speaker encoding
- infer speaker embedding which will be used with a multi-speaker generative model

Text carries linguistic information
Speaker representation captures speaker's characteristics (pitch, speech rate, accent)
This paper focuses on voice cloning
Compares speech naturalness, speaker similarity, cloning/inference time, model footprint

Paper Notations

Paper avoids mode collapse with training speaker encoder seperately

Because human is so expensive, paper propose those two solutions for evaluation

LibriSpeech dataset for multi-speaker generative model & speaker encoder model
sampling from VCTK for voice cloning

Mar 04 '18 04:03 flrngel