TF_ContinualLearningViaSynapticIntelligence
TF_ContinualLearningViaSynapticIntelligence copied to clipboard
Loss normalization
May I ask why aren't you normalizing the cross_entropy loss across the batch before calculating the gradients in the following line:
cross_entropy = -tf.reduce_sum( y_tgt*tf.log(y+1e-04) + (1.-y_tgt)*tf.log(1.-y+1e-04) )
If I try to change it to a normalized version
cross_entropy = tf.reduce_mean(-tf.reduce_sum( y_tgt*tf.log(y+1e-04) + (1.-y_tgt)*tf.log(1.-y+1e-04) )
I could see that small_omega_vars updates are very small (due to smaller gradients) and consequently resultant big_omge_var is also very small. This renders the model to drift a lot on the earlier tasks. I wonder if the authors mentioned anything about summing the gradients across the batch and not normalizing it?