tensorflow-wavenet icon indicating copy to clipboard operation
tensorflow-wavenet copied to clipboard

Linguistic features for p280 speaker

Open bajibabu opened this issue 8 years ago • 15 comments

Hi,

I generated the linguistic features as mentioned in the WaveNet paper for p280 speaker. If anyone is interested to use them for conditioning in WaveNet, please download via https://users.aalto.fi/~bollepb1/binary_labels_p280.zip. Each frame or row corresponds to 5ms of speech.

bajibabu avatar Sep 26 '16 16:09 bajibabu

That's very cool, thanks! I haven't looked into generating linguistic features yet. Can you explain what you've done to generate these?

ibab avatar Sep 26 '16 16:09 ibab

  1. I used the HTS http://hts.sp.nitech.ac.jp/ toolkit to get the full-context labels from text files.
  2. State-level durations are obtained by HMM-based force-alignment steps using the same HTS toolkit
  3. The full-context label features are transformed into binary and numerical features using Merlin toolkit https://github.com/CSTR-Edinburgh/merlin

bajibabu avatar Sep 26 '16 17:09 bajibabu

Very cool! To my understanding, we should be able to feed these vectors directly into the training and generation, after a sort of preprocessing step where they are generated based on an input string, correct? It might be worth wrapping this up into a function to make that process easier.

@bajibabu is one of these features F0 or will that need to be generated separately?

mortont avatar Sep 26 '16 20:09 mortont

@mortont Oops! I forget to put the F0 values.. I will append them on tomorrow morning.

bajibabu avatar Sep 26 '16 21:09 bajibabu

I updated the label files with F0 values.

bajibabu avatar Sep 27 '16 09:09 bajibabu

Thanks @bajibabu! I've never used HTS or merlin, could you walk through the steps you used to create these in more detail?

mortont avatar Sep 27 '16 14:09 mortont

You can find the more details in this post http://www.speech.zone/exercises/build-your-own-dnn-voice/prepare-the-input-labels/

bajibabu avatar Sep 28 '16 07:09 bajibabu

@bajibabu im trying to use the linguistic features that you help generated for speaker p208 to feed into the WaveNet model to generate a meaningful voice like "Hello, WaveNet!" - have you done that, and if so, can you help share the detailed steps to recreate that? thanks!

rockyrmit avatar Oct 06 '16 23:10 rockyrmit

I didn't do that.

bajibabu avatar Oct 08 '16 07:10 bajibabu

my subject is also TTS. And the features of p280, including the full-lab part and the f0 value? have all these value changed to binary, could you please give a detailed description of the features

liangmin0020 avatar Jan 23 '17 02:01 liangmin0020

@bajibabu Hi, bajibabu. I am a newer to this field and very interested to the local conditioning. I tried the link you provided to download the linguistic features, while it turns out it is not available. Would you please send me a copy of that ? Thank you .

DabiaoMa avatar Apr 20 '17 04:04 DabiaoMa

@bajibabu Can you update the link to the linguistic features you computed?

rafaelvalle avatar Nov 22 '17 19:11 rafaelvalle

I couldn't contact to @bajibabu. Hope someone still has the link in the computer will share it to everyone.

toannhu avatar Dec 08 '17 08:12 toannhu

@rockyrmit do you have the zip file with linguistic features?

rafaelvalle avatar Dec 11 '17 17:12 rafaelvalle

I need the zip file, as well. can any one share it?

AzamRabiee avatar May 14 '18 06:05 AzamRabiee