nmaac
nmaac
没错。p1, p2 和 conv 权重都是可学习参数。
r是channel放缩系数,可以根据你的需要调整,对准确率影响不大。这是个很常用的减少参数量的技巧,早在2016年的HyperNetworks就已经是常规做法。 这里可以引用原文的解释来回答:“a one-layered hypernetwork would have Nz × Nin × fsize × Nout × fsize learnable parameters which is usually much bigger than a two-layered hypernetwork does.”
@jinfagang It depends on hardware platform, normally 10%-20% latency increment.
@jinfagang But ACON is a good choice which has the same speed with Swish, and they have the same speed with ReLU if using hard-sigmoid to implement :)
@jinfagang I suggest ACON-C which improves the performance with a negligible overhead and shows a good accuracy-speed tradeoff.
@yxNONG MetaAcon uses a small network to generate beta, in this work we try some network examples which show sigmoid has good performance. More choices and designs of this small...
Hi @feizhaixiaomimei , since acon is a general form, you can use it in any networks by simply replacing ReLU.
可以用在全连接层,比如你的tensor shape是 (batch, width),你可以简单修改 https://github.com/nmaac/acon/blob/main/acon.py#L13-15 为 self.p1 = nn.Parameter(torch.randn(1, width)) self.p2 = nn.Parameter(torch.randn(1, width)) self.beta = nn.Parameter(torch.ones(1, width))