HantingChen
HantingChen
暂时没有开源计划,谢谢关注
It is a single param.
加到BN之后
eta is set as 0.2, which will be reported in the camera-ready version.
> > > Thank you, Hanting. And what about the T_max (period) and eta_min (lower bound) in cosine learning rate decay of the MNIST experiment? 0.1 and 0
> > > > > Thank you, Hanting. And what about the T_max (period) and eta_min (lower bound) in cosine learning rate decay of the MNIST experiment? > > >...
> @HantingChen > if eta is not set as 0.2, for example we set it as 0.1 or 0.4, will the result be quiet different from 0.2? The ablation study...
> @HantingChen > 论文里resnet20-cifar10 的性能是91.84, 并且是无任何乘法的, 但是我看到你们的代码里, 首尾层是普通卷积层, 所以 > 如果首尾层也是加法层,resnet20-cifar10的性能也是91.84吗? 这个是论文的笔误吗? 你好,论文里的模型和代码里的一样,论文中的表忽略了首尾层的乘法量(因为其远少于整个模型的计算量,已在论文中说明)
> @HantingChen > 此外, 如果是在imagenet和cifar上, eta的影响也是像mnist上一样, 只有0.2个点的波动吗? 这个你们有对比试验吗? 暂无对比试验
We add BN layer after adder layer to solve this problem in our paper.