neuralmonkey icon indicating copy to clipboard operation
neuralmonkey copied to clipboard

Beam search decoder with attention model does not work

Open anoopkunchukuttan opened this issue 7 years ago • 4 comments
trafficstars

Hi,

I get an error when I train a model that uses attention and a beam search decoder. On the other hand, these two scenarios work fine:

  • Attention, but no beam search decoder (greedy decoder).
  • No attention, but beam search decoder is used.

The exception message mentions 'Incompatible shapes'. Please find the error log and the training config attached. Is there a problem with the config file.

err.log train_ini.txt

anoopkunchukuttan avatar Mar 22 '18 13:03 anoopkunchukuttan

Hi, thanks for letting us know. Beam search with attention currently only works with batch size 1. The workaround is setting runners_batch_size to 1 in the [main] section of the configuration file.

In general, I don't think there are any benefits of using beam search during training. Greedy decoding during validations gives you a good estimate of the model performance for storing the model checkpoints. Later, you can always use the models with beam search during inference.

jlibovicky avatar Mar 22 '18 14:03 jlibovicky

Thanks, yes beam search with attention works for batch size of 1. Do you plan to support larger batch sizes soon? That would be really useful.

anoopkunchukuttan avatar Mar 28 '18 09:03 anoopkunchukuttan

Hi, Yes, we plan to support larger batch sizes. However, this will require a non-trivial refactoring of the current implementation of the attentions so I cannot tell when exactly this feature will be truly supported.

varisd avatar Mar 28 '18 09:03 varisd

Hi, please also note that increasing batch size for inference likely won't bring any time improvements on CPUs. Moreover, as far as I know, other toolkits don't provide this feature at all..

Dne st 28. 3. 2018 5:45 dop. uživatel Dušan Variš [email protected] napsal:

Hi, Yes, we plan to support larger batch sizes. However, this will require a non-trivial refactoring of the current implementation of the attentions so I cannot tell when exactly this feature will be truly supported.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ufal/neuralmonkey/issues/679#issuecomment-376826989, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwcs0cwhnUMLXKcRpV6PM-MdZxK4_R3ks5ti1uzgaJpZM4S3BiU .

jindrahelcl avatar Mar 28 '18 12:03 jindrahelcl