AdderNet icon indicating copy to clipboard operation
AdderNet copied to clipboard

About the local learning rate

Open hmgxr128 opened this issue 5 years ago • 12 comments

In equation 13, how to set the value of \yeta is not clarified. I'm quite confused.

hmgxr128 avatar Apr 16 '20 08:04 hmgxr128

I'm also confused about it.

JamesHujy avatar Apr 16 '20 09:04 JamesHujy

eta is set as 0.2, which will be reported in the camera-ready version.

HantingChen avatar Apr 16 '20 09:04 HantingChen

Thank you, Hanting. And what about the T_max (period) and eta_min (lower bound) in cosine learning rate decay of the MNIST experiment?

hmgxr128 avatar Apr 17 '20 07:04 hmgxr128

Thank you, Hanting. And what about the T_max (period) and eta_min (lower bound) in cosine learning rate decay of the MNIST experiment?

0.1 and 0

HantingChen avatar Apr 17 '20 07:04 HantingChen

Thank you, Hanting. And what about the T_max (period) and eta_min (lower bound) in cosine learning rate decay of the MNIST experiment?

0.1 and 0 Sorry, but T_max is suggested to be an integer representing the maximum number of iterations, should it be 50 (number of epochs)? I guess you mean the initial learning rate is 0.1?

hmgxr128 avatar Apr 17 '20 07:04 hmgxr128

Thank you, Hanting. And what about the T_max (period) and eta_min (lower bound) in cosine learning rate decay of the MNIST experiment?

0.1 and 0 Sorry, but T_max is suggested to be an integer representing the maximum number of iterations, should it be 50 (number of epochs)? I guess you mean the initial learning rate is 0.1?

Sorry for the mistake. T_max is 50 and initial learning rate is 0.1

HantingChen avatar Apr 17 '20 07:04 HantingChen

@HantingChen if eta is not set as 0.2, for example we set it as 0.1 or 0.4, will the result be quiet different from 0.2?

brisker avatar May 01 '21 15:05 brisker

@HantingChen if eta is not set as 0.2, for example we set it as 0.1 or 0.4, will the result be quiet different from 0.2?

The ablation study can be found in the paper. (Table 4 in https://openaccess.thecvf.com/content_CVPR_2020/papers/Chen_AdderNet_Do_We_Really_Need_Multiplications_in_Deep_Learning_CVPR_2020_paper.pdf)

HantingChen avatar May 06 '21 02:05 HantingChen

@HantingChen 论文里resnet20-cifar10 的性能是91.84, 并且是无任何乘法的, 但是我看到你们的代码里, 首尾层是普通卷积层, 所以 如果首尾层也是加法层,resnet20-cifar10的性能也是91.84吗? 这个是论文的笔误吗?

brisker avatar May 06 '21 03:05 brisker

@HantingChen 此外, 如果是在imagenet和cifar上, eta的影响也是像mnist上一样, 只有0.2个点的波动吗? 这个你们有对比试验吗?

brisker avatar May 06 '21 06:05 brisker

@HantingChen 论文里resnet20-cifar10 的性能是91.84, 并且是无任何乘法的, 但是我看到你们的代码里, 首尾层是普通卷积层, 所以 如果首尾层也是加法层,resnet20-cifar10的性能也是91.84吗? 这个是论文的笔误吗?

你好,论文里的模型和代码里的一样,论文中的表忽略了首尾层的乘法量(因为其远少于整个模型的计算量,已在论文中说明)

HantingChen avatar May 17 '21 02:05 HantingChen

@HantingChen 此外, 如果是在imagenet和cifar上, eta的影响也是像mnist上一样, 只有0.2个点的波动吗? 这个你们有对比试验吗?

暂无对比试验

HantingChen avatar May 17 '21 02:05 HantingChen