OpenNMT-py
OpenNMT-py copied to clipboard
[WIP] The Missing Ingredient in Zero-Shot Neural Machine Translation
This PR intends to add an implementation of the cosine similarity alignment loss introduced as a regularization term in The Missing Ingredient in Zero-Shot Neural Machine Translation.
Note: Impact on speed is quite significant: as we need to reduce batches to make place in memory for the additional representations, we can loose up to 20-25% in training speed, both in FP32 and FP16 modes.
For the record, we discussed offline if this should be in the code of NMTModel or Trainer. For performance reason it needs to be in NMTModel (encoding through forward of src and tgt), but it makes the API a little less "clear". We opted for performance, but we kept the API intact when this new loss is not used.