TF_ContinualLearningViaSynapticIntelligence icon indicating copy to clipboard operation
TF_ContinualLearningViaSynapticIntelligence copied to clipboard

Loss normalization

Open arslan-chaudhry opened this issue 6 years ago • 4 comments

May I ask why aren't you normalizing the cross_entropy loss across the batch before calculating the gradients in the following line:

cross_entropy = -tf.reduce_sum( y_tgt*tf.log(y+1e-04) + (1.-y_tgt)*tf.log(1.-y+1e-04) )

If I try to change it to a normalized version cross_entropy = tf.reduce_mean(-tf.reduce_sum( y_tgt*tf.log(y+1e-04) + (1.-y_tgt)*tf.log(1.-y+1e-04) )

I could see that small_omega_vars updates are very small (due to smaller gradients) and consequently resultant big_omge_var is also very small. This renders the model to drift a lot on the earlier tasks. I wonder if the authors mentioned anything about summing the gradients across the batch and not normalizing it?

arslan-chaudhry avatar Sep 13 '17 20:09 arslan-chaudhry