Wentao Wang
Wentao Wang
This example demonstrates the usage of Differentiable Expected BLEU (DEBLEU) with teacher mask and triggers. See https://openreview.net/pdf?id=S1x2aiRqFX for the implemented paper. The IWSLT'14 English-French dataset is to be added.
I found your code measuring norms and clipping gradients for each parameter separately. But in the paper, the authors said "the l2 norm of the whole gradient of all parameters..."...
[Here](https://github.com/harvardnlp/data2text/blob/203281f20f8c6ce67ed1f2b28cd3a8fcb09a94ac/data_utils.py#L372) you're getting `num_to_keep` based on the number of examples not labeled as NONE (`len(trlabels)-len(none_idxs)`); however, according to the comment above you should use the number of examples labeled as...
Here are several issues I find in the rule-based entity/number extraction functions in `data_utils.py`. I'm still working on the dataset and will probably update more once I observe them. These...