tacotron
tacotron copied to clipboard
Adding linguistic features for expressive synthesis ?
Hi !
Thanks for this great implementation !
I'm a speech scientist, and I'm not an expert in neural networks. I'm working on expressive speech synthesis, and I'm currently wondering about the possibility to add linguistic features as input data to Tacotron.
At the moment, I was thinking of using a tensor that has the same size as the input character sequence, where each element would correspond to a linguistic label.
For example, if we use part-of-speech tags as a new input feature, the character sequence of the sentence "this is my sentence" would be something like : [47, 35, 36, 46, 64, 36, 46, 64, 40, 52, 64, 46, 32, 41, 47, 32, 41, 30, 32]
and the corresponding part-of-speech sequence would be : [1, 1, 1, 1, -1, 2, 2, -1, 3, 3, -1, 4, 4, 4, 4, 4, 4, 4, 4] where each positive number corresponds to a specific tag, and -1 to spaces.
- Would this approach be suitable ?
- If yes, how could I add such data as inputs to Tacotron, knowing that I would like to be able to add several features ?
- If not, would someone have any recommendation ?
Thanks !
I think you might create another embedding layer for pos tagging, then concatenate two embeddings as together
Thanks for your reply @patrick-g-zhang . I will try this.
@samuel-lunii Hello Samuel, Can you please tell me if you succeed to add linguistic features? Can you share the modifications made? Thank you a lot!
Hi @LauraLQ ,
I actually stopped working on this project, but it seems that @patrick-g-zhang 's answer is the path to follow.
There is also interest in variational embeddings for expressive synthesis, i'd recommend reading this paper
Hope this helps !
Hello @samuel-lunii,
Thank you for your response and recommendation!
Best regards!
Dear @patrick-g-zhang,
I have tried to make another embedding layer as you suggest but I receive some errors. The code is:
First Embeddings
embedding_table = tf.get_variable(
'embedding', [len(symbols), hp.embed_depth], dtype=tf.float32,
initializer=tf.truncated_normal_initializer(stddev=0.5))
embedded_inputs = tf.nn.embedding_lookup(embedding_table, inputs) # [N, T_in, embed_depth=256]
Second Embeddings
embedding_table2 = tf.get_variable(
'embedding', [len(sym), hp.embed_depth], dtype=tf.float32,
initializer=tf.truncated_normal_initializer(stddev=0.5))
embedded_inputs2 = tf.nn.embedding_lookup(embedding_table, POSinputs) # [N, T_in, embed_depth=256]
And the error is: "embedding already exists" and if I give another name I receive "None value not suported"
Can you please help me with some suggestions, I mention that I do not know very tensorflow.
Thank you a lot!