Hi !

Thanks for this great implementation !

I'm a speech scientist, and I'm not an expert in neural networks. I'm working on expressive speech synthesis, and I'm currently wondering about the possibility to add linguistic features as input data to Tacotron.

At the moment, I was thinking of using a tensor that has the same size as the input character sequence, where each element would correspond to a linguistic label.

For example, if we use part-of-speech tags as a new input feature, the character sequence of the sentence "this is my sentence" would be something like : [47, 35, 36, 46, 64, 36, 46, 64, 40, 52, 64, 46, 32, 41, 47, 32, 41, 30, 32]

and the corresponding part-of-speech sequence would be : [1, 1, 1, 1, -1, 2, 2, -1, 3, 3, -1, 4, 4, 4, 4, 4, 4, 4, 4] where each positive number corresponds to a specific tag, and -1 to spaces.

Would this approach be suitable ?
If yes, how could I add such data as inputs to Tacotron, knowing that I would like to be able to add several features ?
If not, would someone have any recommendation ?

Thanks !

May 20 '19 13:05 samuel-lunii

I think you might create another embedding layer for pos tagging, then concatenate two embeddings as together

Jun 02 '19 06:06 patrick-g-zhang

Thanks for your reply @patrick-g-zhang . I will try this.

Jun 21 '19 12:06 samuel-lunii

@samuel-lunii Hello Samuel, Can you please tell me if you succeed to add linguistic features? Can you share the modifications made? Thank you a lot!

Apr 27 '20 14:04 LauraLQ

Hi @LauraLQ ,

I actually stopped working on this project, but it seems that @patrick-g-zhang 's answer is the path to follow.

There is also interest in variational embeddings for expressive synthesis, i'd recommend reading this paper

Hope this helps !

Apr 28 '20 08:04 samuel-lunii

Hello @samuel-lunii,

Thank you for your response and recommendation!

Best regards!

Apr 28 '20 09:04 LauraLQ

Dear @patrick-g-zhang,

I have tried to make another embedding layer as you suggest but I receive some errors. The code is:

First Embeddings

  embedding_table = tf.get_variable(
    'embedding', [len(symbols), hp.embed_depth], dtype=tf.float32,
    initializer=tf.truncated_normal_initializer(stddev=0.5))
  embedded_inputs = tf.nn.embedding_lookup(embedding_table, inputs)          # [N, T_in, embed_depth=256]

Second Embeddings

  embedding_table2 = tf.get_variable(
    'embedding', [len(sym), hp.embed_depth], dtype=tf.float32,
    initializer=tf.truncated_normal_initializer(stddev=0.5))
  embedded_inputs2 = tf.nn.embedding_lookup(embedding_table, POSinputs)          # [N, T_in, embed_depth=256]

And the error is: "embedding already exists" and if I give another name I receive "None value not suported"

Can you please help me with some suggestions, I mention that I do not know very tensorflow.

Thank you a lot!

May 19 '20 14:05 LauraLQ

tacotron
tacotron copied to clipboard

Adding linguistic features for expressive synthesis ?

First Embeddings

Second Embeddings

tacotron tacotron copied to clipboard

Adding linguistic features for expressive synthesis ?

First Embeddings

Second Embeddings

tacotron
tacotron copied to clipboard