keras-io
keras-io copied to clipboard
`transformer_asr.py`: incorrect `source_maxlen`
https://github.com/keras-team/keras-io/blob/master/examples/audio/transformer_asr.py
In the code at the above link, I found that source_maxlen is defaulted to 100 in the transformer.
The problem, though, is that the inputs are actually padded to length 2754, where it's then downsampled with CNN by a factor of 8. The result is a sequence of length 345, which is far greater than 2754.
Correct me if I am wrong, but I reckon that is a bug?
Problem code:
In the transformer definition, source_maxlen is defaulted to 100:
class Transformer(keras.Model):
def __init__(
self,
num_hid=64,
num_head=2,
num_feed_forward=128,
source_maxlen=100,
target_maxlen=100,
num_layers_enc=4,
num_layers_dec=1,
num_classes=10,
):
... which isn't explicitly set at instantiation:
model = Transformer(
num_hid=200,
num_head=2,
num_feed_forward=400,
target_maxlen=max_target_len,
num_layers_enc=4,
num_layers_dec=1,
num_classes=34,
)
@apoorvnandan, would you be able to help in the above issue, related to your published tutorial here https://keras.io/examples/audio/transformer_asr/
Hi! Just saw this.
On a cursory glance, it does look like a bug.
- The param name is misleading. That param is used to determine the
input_dimin theEmbeddinglayer. which should be something liek 129. (which comes from the stft of audio) - I think it should be explicity set to the above value instead of letting it default.
I'm not a 100% sure though. Will try to go through this after work to check if there is something I missed.
This issue is stale because it has been open for 180 days with no activity. It will be closed if no further activity occurs. Thank you.
This issue is stale because it has been open for 180 days with no activity. It will be closed if no further activity occurs. Thank you.