acon
acon copied to clipboard
good job!i have a question
看了源代码,meta-acon要自动学习self.p,self.q,以及生成β的conv每层权重?
没错。p1, p2 和 conv 权重都是可学习参数。
你好,我想咨询一下meta-acon里面的r具体作用是啥,可以随便改变默认值吗
r是channel放缩系数,可以根据你的需要调整,对准确率影响不大。这是个很常用的减少参数量的技巧,早在2016年的HyperNetworks就已经是常规做法。
这里可以引用原文的解释来回答:“a one-layered hypernetwork would have Nz × Nin × fsize × Nout × fsize learnable parameters which is usually much bigger than a two-layered hypernetwork does.”
@nmaac Did u tested how many speed drop if using MetaAcon compare with normal activation without learnable params? I saw there is no such comparsion in paper but seems will introduce latency increase
@jinfagang It depends on hardware platform, normally 10%-20% latency increment.
@nmaac Oh....
@jinfagang But ACON is a good choice which has the same speed with Swish, and they have the same speed with ReLU if using hard-sigmoid to implement :)
@nmaac You mean ACON-ABC?
@jinfagang I suggest ACON-C which improves the performance with a negligible overhead and shows a good accuracy-speed tradeoff.
@nmaac I had a question about beta in MetaACON, in paper it mention when beta goes infinite, the loss function will change to the max(x1, x2). However, in MetaACON, beta is generaed by sigmoid function which means that the range of beta is (0, 1). Is there any reason for this choice?
@yxNONG MetaAcon uses a small network to generate beta, in this work we try some network examples which show sigmoid has good performance. More choices and designs of this small network is not the focus of this work but is a promising future direction.
@nmaac got it, i will try the ReLU and identities which is much more make sense to me, thanks for your reply!