keras-io icon indicating copy to clipboard operation
keras-io copied to clipboard

What is the purpose of SpeechFeatureEmbedding class

Open BernardoOlisan opened this issue 3 years ago • 2 comments
trafficstars

So we have the Transformers input layer, that do the Input Embedding job and the position encoding context, but then a class like this is apply

class SpeechFeatureEmbedding(layers.Layer):
    def __init__(self, num_hid=64, maxlen=100):
        super().__init__()
        self.conv1 = tf.keras.layers.Conv1D(
            num_hid, 11, strides=2, padding="same", activation="relu"
        )
        self.conv2 = tf.keras.layers.Conv1D(
            num_hid, 11, strides=2, padding="same", activation="relu"
        )
        self.conv3 = tf.keras.layers.Conv1D(
            num_hid, 11, strides=2, padding="same", activation="relu"
        )
        self.pos_emb = layers.Embedding(input_dim=maxlen, output_dim=num_hid)

    def call(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        return self.conv3(x)

So my question is, why do we need to add conv1D neural networks to this? we already have a positional encoding don't we? What is the purpose of this...

is in this code https://github.com/keras-team/keras-io/blob/master/examples/audio/transformer_asr.py

BernardoOlisan avatar Mar 08 '22 19:03 BernardoOlisan

To my understanding SpeechFeatureEmbedding only supplies the Encoder input, and TokenEmbedding supplies the Decoder input. However, it seems SpeechFeatureEmbedding doesn't do Positional Encoding. I'm not sure whether this is a bug, so I've asked for clarification .

rshahamiriuoa avatar Jan 01 '23 02:01 rshahamiriuoa

I have the same question as like @BernardoOlisan of SpeechFeatureEmbedding class defined self.pos_emb = layers.Embedding(input_dim=maxlen, output_dim=num_hid) but not used. I need clarification also.

saiful9379 avatar Mar 30 '23 18:03 saiful9379

As mentioned in one of the comment SpeechFeatureEmbedding is for encoder input and TokenEmbedding is for decoder output. Conv layers are used tgenerally for audio.

sachinprasadhs avatar Jul 23 '24 23:07 sachinprasadhs

This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.

github-actions[bot] avatar Aug 07 '24 01:08 github-actions[bot]

This issue was closed because it has been inactive for 28 days. Please reopen if you'd like to work on this further.

github-actions[bot] avatar Aug 23 '24 01:08 github-actions[bot]

Are you satisfied with the resolution of your issue? Yes No

github-actions[bot] avatar Aug 23 '24 01:08 github-actions[bot]