practical-pytorch No definition for max

Hi @spro,

I don't know if this is expected, but you don't define the attention size based on max_length of words. Can you elaborate more about this?

Also, this example do not train using batch, how we do it?

Thank you

May 23 '17 07:05 sidoki

More leftover code... in this case the tutorial was using a different strategy (Ctrl-F "location-based" in Effective Approaches to Attention-based Neural Machine Translation). In that case the attention is calculated solely from the hidden state, using a linear layer with a fixed size output max_length.

May 23 '17 07:05 spro

Hi @spro,

If i can conclude, instead of using location-based this tutorial using content-based, right? so there is no need to define the max_length anymore. How about the evaluation process? we still need the max_length, right?

Thank you

May 23 '17 07:05 sidoki

Right & right. The max length is also used to filter training data.

May 23 '17 08:05 spro

Got it @spro

By the way, i have additional question regarding to training process. If i skim your code, the selection of sample training data was done by random choice

training_pair = variables_from_pair(random.choice(pairs))

This could lead to some training data will not get selected. How we deal with this?

Also, each epoch only train single pairs only, is this good? why not in batch? just curious though

May 23 '17 08:05 sidoki

There are many more iterations than training examples, so it is fairly likely to cover them all. A more reliable way would be to go through the examples in order and shuffle at the end of every epoch.

Batching would be an improvement but it will take some extra work to implement (the models currently build in the assumption that batch_size=1) - opened #27 to track that.

May 23 '17 19:05 spro

Thanks @spro,

I see, glad to hear that there will be implementation for batch process. Can't wait for that!

May 24 '17 02:05 sidoki

No definition for max_length in attention