lambda-networks Does LambdaLayer need BatchNorm and activation after it?

Hello, I'm trying to reproduce of this. I'm build the LambdaResnet,a little question is BatchNorm and activation are needed after this(LambdaLayer)? Thanks.

Oct 20 '20 06:10 PistonY

@PistonY no I don't believe so, but feel free to correct me if I'm wrong

Oct 20 '20 21:10 lucidrains

Hi @lucidrains, thanks for reply. I tested them all, I think you're right. When apply bn+relu the val accuracy doesn't grow. This is my final implement.

Now I'm training LambdaResnet50,it's looking good: I use the standard training step in my project same with Resnet50 except batch_size set to 64.

Epoch 28 result: Train Acc:0.44 Loss:2.54 Val Acc:0.48 Loss:3.1748e+08 Some observations are:

parameters and GFLOPs are small but training speed and gpu memory cost are still high.
Mix-Precision(FP16) training by torch.cuda.amp would make train loss nan, that's make me could only train with 64 batch_size.
Val loss is strange.
Convergence is much slower than Resnet50.

Oct 21 '20 07:10 PistonY

@PistonY Awesome, thanks for sharing! Let's keep this issue open so others can build upon it

Oct 21 '20 19:10 lucidrains

@lucidrains Unfortunately, I only got 76.1 best top1 on val set(79.2 on train set). I'd better wait author release their code.

Oct 26 '20 01:10 PistonY

@PistonY Good to know Devin, thanks for sharing!

Oct 26 '20 19:10 lucidrains

Hi, @lucidrains

I also build lambdaresnet50 using lambdalayer. And I encountered a situation that my loss is like PistonY's. the order is large.

And I found that the results after lambdalayer were getting larger and larger and became nan.

here is my model code snippet for forward (just modified the torchvision resnet and replace the conv2 in Bottleneck class to lambdalayer)

the result

Could you give me some advices about this?

Nov 02 '20 07:11 Kirayue

@PistonY no I don't believe so, but feel free to correct me if I'm wrong

If I understand your implementation correctly the positional Lambda interaction is purely linear, which means it should be followed by an activation and batchnorm? This is what I tried at least and its working normally.

Nov 08 '20 16:11 zengyi-li

lambda-networks lambda-networks copied to clipboard

Does LambdaLayer need BatchNorm and activation after it?

lambda-networks
lambda-networks copied to clipboard