neuralmonkey How to train a transformer model with multi-source encoders ?

How to train a transformer model with multi-source encoders ?

Open penny9287 opened this issue 5 years ago • 3 comments

I wonder how to modify the configuration file to train a multi-source based transformer model with different attention types.

Jun 19 '19 05:06 penny9287

Hi, currently, the Transformer decoder only supports the multi-head scaled dot-product attention from the "Attention is All You Need" paper. If you provide multiple encoders, you can choose which attention combination strategy you want to use, one of serial, parallel, hierarchical, and flat.

Jun 19 '19 21:06 jindrahelcl

I wonder how to specify the combination strategy for multiple encoders in the configuration file, have any examples?

Jun 20 '19 03:06 wyjllm

just specify the attention_combination_strategy parameter in the transformer decoder configuration. It can be one of serial, parallel, hierarchical, and flat.

Jun 20 '19 22:06 jindrahelcl

neuralmonkey neuralmonkey copied to clipboard

How to train a transformer model with multi-source encoders ?

neuralmonkey
neuralmonkey copied to clipboard