keras-io
keras-io copied to clipboard
What is the purpose of SpeechFeatureEmbedding class
So we have the Transformers input layer, that do the Input Embedding job and the position encoding context, but then a class like this is apply
class SpeechFeatureEmbedding(layers.Layer):
def __init__(self, num_hid=64, maxlen=100):
super().__init__()
self.conv1 = tf.keras.layers.Conv1D(
num_hid, 11, strides=2, padding="same", activation="relu"
)
self.conv2 = tf.keras.layers.Conv1D(
num_hid, 11, strides=2, padding="same", activation="relu"
)
self.conv3 = tf.keras.layers.Conv1D(
num_hid, 11, strides=2, padding="same", activation="relu"
)
self.pos_emb = layers.Embedding(input_dim=maxlen, output_dim=num_hid)
def call(self, x):
x = self.conv1(x)
x = self.conv2(x)
return self.conv3(x)
So my question is, why do we need to add conv1D neural networks to this? we already have a positional encoding don't we? What is the purpose of this...
is in this code https://github.com/keras-team/keras-io/blob/master/examples/audio/transformer_asr.py
To my understanding SpeechFeatureEmbedding only supplies the Encoder input, and TokenEmbedding supplies the Decoder input. However, it seems SpeechFeatureEmbedding doesn't do Positional Encoding. I'm not sure whether this is a bug, so I've asked for clarification .
I have the same question as like @BernardoOlisan of SpeechFeatureEmbedding class defined self.pos_emb = layers.Embedding(input_dim=maxlen, output_dim=num_hid) but not used. I need clarification also.
As mentioned in one of the comment SpeechFeatureEmbedding is for encoder input and TokenEmbedding is for decoder output.
Conv layers are used tgenerally for audio.
This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.
This issue was closed because it has been inactive for 28 days. Please reopen if you'd like to work on this further.