centerloss icon indicating copy to clipboard operation
centerloss copied to clipboard

cifia 的center loss 的训练过程不收敛

Open JinmingZhao opened this issue 6 years ago • 5 comments

您好, 下面是几个阶段的训练输出,center-loss 下降的特别快,而softmax loss 基本不动,随着训练进行,centerloss 逐渐增加,softmax loss 逐渐下降,在其他的数据集上训练过程也是如此,这样正常吗,能否解释一下这个过程的原因? (正常情况下 l2 loss 也是这样,一般先降 l2 loss 然后再降 softmax loss, 这样训练就特别慢, 您能帮忙解释一下不?) 。

谢谢! step: 0, training accuracy: 0.01, training loss: 9.13, center_loss_value: 8.95, softmax_loss_value: 4.60 8.9476 step: 100, training accuracy: 0.00, training loss: 4.60, center_loss_value: 0.00, softmax_loss_value: 4.60 0.000475059 step: 200, training accuracy: 0.00, training loss: 4.60, center_loss_value: 0.00, softmax_loss_value: 4.60 0.000334069 step: 300, training accuracy: 0.00, training loss: 4.61, center_loss_value: 0.00, softmax_loss_value: 4.61 0.000232262 ... step: 7000, validation accuracy: 0.00, validation loss: 4.61 step: 7000, training accuracy: 0.02, training loss: 4.61, center_loss_value: 0.01, softmax_loss_value: 4.60 0.00856465 step: 7100, training accuracy: 0.00, training loss: 4.60, center_loss_value: 0.01, softmax_loss_value: 4.60 0.00611517 step: 7200, training accuracy: 0.03, training loss: 4.59, center_loss_value: 0.01, softmax_loss_value: 4.59 0.00610098 step: 7300, training accuracy: 0.02, training loss: 4.60, center_loss_value: 0.01, softmax_loss_value: 4.59 0.00591285 ... step: 10000, validation accuracy: 0.02, validation loss: 4.60 step: 10000, training accuracy: 0.03, training loss: 4.60, center_loss_value: 0.02, softmax_loss_value: 4.55 0.0197095 step: 10100, training accuracy: 0.05, training loss: 4.58, center_loss_value: 0.02, softmax_loss_value: 4.56 0.019367 step: 10200, training accuracy: 0.01, training loss: 4.59, center_loss_value: 0.02, softmax_loss_value: 4.56 0.0246912 step: 10300, training accuracy: 0.00, training loss: 4.59, center_loss_value: 0.03, softmax_loss_value: 4.56 0.0319362 step: 10400, training accuracy: 0.03, training loss: 4.56, center_loss_value: 0.03, softmax_loss_value: 4.53 0.0250489 .... step: 15000, validation accuracy: 0.03, validation loss: 4.59 step: 15000, training accuracy: 0.02, training loss: 4.59, center_loss_value: 0.06, softmax_loss_value: 4.48 0.0584633 step: 15100, training accuracy: 0.01, training loss: 4.55, center_loss_value: 0.07, softmax_loss_value: 4.49 0.0664418 step: 15200, training accuracy: 0.04, training loss: 4.48, center_loss_value: 0.05, softmax_loss_value: 4.43 0.0492492 step: 15300, training accuracy: 0.05, training loss: 4.53, center_loss_value: 0.05, softmax_loss_value: 4.48 ...step: 79000, validation accuracy: 0.09, validation loss: 12.37 step: 79000, training accuracy: 0.25, training loss: 12.37, center_loss_value: 0.12, softmax_loss_value: 2.98 0.117979 step: 79100, training accuracy: 0.23, training loss: 3.09, center_loss_value: 0.11, softmax_loss_value: 2.98 0.11036 step: 79200, training accuracy: 0.25, training loss: 3.10, center_loss_value: 0.12, softmax_loss_value: 2.98 0.118568 step: 79300, training accuracy: 0.23, training loss: 3.15, center_loss_value: 0.16, softmax_loss_value: 2.99 0.161232 step: 79400, training accuracy: 0.23, training loss: 3.24, center_loss_value: 0.22, softmax_loss_value: 3.01 0.220702 step: 79500, training accuracy: 0.21, training loss: 3.14, center_loss_value: 0.16, softmax_loss_value: 2.98 0.159162 step: 79600, training accuracy: 0.21, training loss: 3.09, center_loss_value: 0.12, softmax_loss_value: 2.97 0.121266 step: 79700, training accuracy: 0.22, training loss: 3.18, center_loss_value: 0.18, softmax_loss_value: 2.99 0.184051 step: 79800, training accuracy: 0.18, training loss: 3.14, center_loss_value: 0.17, softmax_loss_value: 2.98 0.1673 step: 79900, training accuracy: 0.19, training loss: 3.13, center_loss_value: 0.15, softmax_loss_value: 2.98

JinmingZhao avatar Feb 27 '18 06:02 JinmingZhao

您好,我仅说说我自己的观点。 我个人认为因为center loss的降低是比较容易的,假设最极端的情况下,也就是无监督,我们不优化softmax loss。在这种情况下,center loss很容易变成0。这是为什么呢? 因为center loss只是计算的学习的特征与中心之间的差距,而且学习得到的特征和center features都是在反向传播中不断更新的。所以,我们只需要让学习得到的特征和center features一致,也就是说不管输入是什么,输出都是一样的。那么对应的center loss就一定是0。对于我们的网络来说,要实现上述功能很简单,甚至一层就可以实现。但明显上述网路的输出是错误的,并不能实现我们的分类功能。

当有softmax loss的时候,因为我们是寻找局部最优点,也许center loss对应的局部最优点很容易找到,但是那个点并不是softmax loss的最优点,所以一开始找到了center loss的局部最优点,但是在后续训练的过程中,会不断的寻找整体的局部最优点,在center loss对应的局部最优点和softmax loss所对应的局部最优点之间寻找一个平衡。

至于l2 loss应该是和center loss一样的道理。

上面的只是我个人的想法,并没有严密的数学逻辑推导证明,如果您有什么更好的解释方法,也希望咱们可以多多沟通交流!

UpCoder avatar Feb 28 '18 06:02 UpCoder

这样说比较直观,比较好理解,非常感谢! 另外一个问题就是 lambda的值有没有好的建议? 一般l2的话我们取10-4左右,这里如果取0.1的话跟 softmax 基本是一个量级,但是这样的话,center loss 迅速的降低,也符合你的理解,太小,比如10-4, 的话也会降的很快,但是如果lambda取0.003(作者论文中结论) center loss 基本将到0.02左右,不会降到太小,然后就能和 softmax 就会基本同步下降。 这样lambda值的选取您有没有好的见解?(同l2 loss 的weight decay 系数)

JinmingZhao avatar Feb 28 '18 13:02 JinmingZhao

我感觉lambda是一个hyper-parameter,应该是要通过实验结果来调整的,一般的话,就10-1~10-4试吧。不过实验结果给我的感觉是不同大小的lambda对分类的准确率是没有提升的。lambda越大的话,最后收敛的就越慢,当然不同类别的center features之间的距离也会变大。

UpCoder avatar Mar 01 '18 09:03 UpCoder

@JinmingZhao @UpCoder 大佬好,我这样做和仅使用softmax没有性能的提升,(lambda,我尝试过1,0.1,0.01)不知为何: `

def center_loss(labels,features, alpha,lambda_c,lambda_g,num_classes):

len_features = features.get_shape()[1]
with tf.variable_scope('v_center',reuse = True):
      centers = tf.get_variable('centers', [num_classes, len_features], dtype=tf.float32,
                          initializer=tf.constant_initializer(0), trainable=False)
labels = tf.argmax(labels, axis=1)
centers_batch = tf.gather(centers, labels)
center_loss = tf.reduce_mean(tf.square(features - centers_batch))
centers_update_op = tf.scatter_sub(centers, labels, diff) 
with tf.control_dependencies([centers_update_op]):
    combo_loss = lambda_c * center_loss + softmax_loss(labels,features)
return combo_loss

total_loss = lossclass.center_loss(alpha=0.5,lambda_c=0.1,num_classes=num_classes) custom_vgg_model.compile(loss=total_loss,optimizer= sgd,metrics=['accuracy'])

`

wangjue-wzq avatar Dec 20 '18 03:12 wangjue-wzq

centers_update_op是否有更新?是在mnist数据集上做的吗?

UpCoder avatar Dec 21 '18 08:12 UpCoder