About the local learning rate
In equation 13, how to set the value of \yeta is not clarified. I'm quite confused.
I'm also confused about it.
eta is set as 0.2, which will be reported in the camera-ready version.
Thank you, Hanting. And what about the T_max (period) and eta_min (lower bound) in cosine learning rate decay of the MNIST experiment?
Thank you, Hanting. And what about the T_max (period) and eta_min (lower bound) in cosine learning rate decay of the MNIST experiment?
0.1 and 0
Thank you, Hanting. And what about the T_max (period) and eta_min (lower bound) in cosine learning rate decay of the MNIST experiment?
0.1 and 0 Sorry, but T_max is suggested to be an integer representing the maximum number of iterations, should it be 50 (number of epochs)? I guess you mean the initial learning rate is 0.1?
Thank you, Hanting. And what about the T_max (period) and eta_min (lower bound) in cosine learning rate decay of the MNIST experiment?
0.1 and 0 Sorry, but T_max is suggested to be an integer representing the maximum number of iterations, should it be 50 (number of epochs)? I guess you mean the initial learning rate is 0.1?
Sorry for the mistake. T_max is 50 and initial learning rate is 0.1
@HantingChen if eta is not set as 0.2, for example we set it as 0.1 or 0.4, will the result be quiet different from 0.2?
@HantingChen if eta is not set as 0.2, for example we set it as 0.1 or 0.4, will the result be quiet different from 0.2?
The ablation study can be found in the paper. (Table 4 in https://openaccess.thecvf.com/content_CVPR_2020/papers/Chen_AdderNet_Do_We_Really_Need_Multiplications_in_Deep_Learning_CVPR_2020_paper.pdf)
@HantingChen 论文里resnet20-cifar10 的性能是91.84, 并且是无任何乘法的, 但是我看到你们的代码里, 首尾层是普通卷积层, 所以 如果首尾层也是加法层,resnet20-cifar10的性能也是91.84吗? 这个是论文的笔误吗?
@HantingChen 此外, 如果是在imagenet和cifar上, eta的影响也是像mnist上一样, 只有0.2个点的波动吗? 这个你们有对比试验吗?
@HantingChen 论文里resnet20-cifar10 的性能是91.84, 并且是无任何乘法的, 但是我看到你们的代码里, 首尾层是普通卷积层, 所以 如果首尾层也是加法层,resnet20-cifar10的性能也是91.84吗? 这个是论文的笔误吗?
你好,论文里的模型和代码里的一样,论文中的表忽略了首尾层的乘法量(因为其远少于整个模型的计算量,已在论文中说明)
@HantingChen 此外, 如果是在imagenet和cifar上, eta的影响也是像mnist上一样, 只有0.2个点的波动吗? 这个你们有对比试验吗?
暂无对比试验