MemN2N-tensorflow Global gradients clipping should be used in stead of clipping each matrix separately.

Global gradients clipping should be used in stead of clipping each matrix separately.

Open wwt17 opened this issue 6 years ago • 0 comments

I found your code measuring norms and clipping gradients for each parameter separately. But in the paper, the authors said "the l2 norm of the whole gradient of all parameters..." and your method was used in QA tasks.

Apr 08 '18 14:04 wwt17

MemN2N-tensorflow MemN2N-tensorflow copied to clipboard

Global gradients clipping should be used in stead of clipping each matrix separately.

MemN2N-tensorflow
MemN2N-tensorflow copied to clipboard