No definition for max_length in attention
Hi @spro,
I don't know if this is expected, but you don't define the attention size based on max_length of words. Can you elaborate more about this?
Also, this example do not train using batch, how we do it?
Thank you
More leftover code... in this case the tutorial was using a different strategy (Ctrl-F "location-based" in Effective Approaches to Attention-based Neural Machine Translation). In that case the attention is calculated solely from the hidden state, using a linear layer with a fixed size output max_length.
Hi @spro,
If i can conclude, instead of using location-based this tutorial using content-based, right? so there is no need to define the max_length anymore. How about the evaluation process? we still need the max_length, right?
Thank you
Right & right. The max length is also used to filter training data.
Got it @spro
By the way, i have additional question regarding to training process. If i skim your code, the selection of sample training data was done by random choice
training_pair = variables_from_pair(random.choice(pairs))
This could lead to some training data will not get selected. How we deal with this?
Also, each epoch only train single pairs only, is this good? why not in batch? just curious though
There are many more iterations than training examples, so it is fairly likely to cover them all. A more reliable way would be to go through the examples in order and shuffle at the end of every epoch.
Batching would be an improvement but it will take some extra work to implement (the models currently build in the assumption that batch_size=1) - opened #27 to track that.
Thanks @spro,
I see, glad to hear that there will be implementation for batch process. Can't wait for that!