lambda-networks
lambda-networks copied to clipboard
Does LambdaLayer need BatchNorm and activation after it?
Hello,
I'm trying to reproduce of this.
I'm build the LambdaResnet,a little question is BatchNorm and activation are needed after this(LambdaLayer)?
Thanks.
@PistonY no I don't believe so, but feel free to correct me if I'm wrong
Hi @lucidrains, thanks for reply.
I tested them all, I think you're right. When apply bn+relu the val accuracy doesn't grow.
This is my final implement.
Now I'm training LambdaResnet50,it's looking good:
I use the standard training step in my project same with Resnet50 except batch_size set to 64.
Epoch 28 result: Train Acc:0.44 Loss:2.54 Val Acc:0.48 Loss:3.1748e+08 Some observations are:
parametersandGFLOPsare small but training speed and gpu memory cost are still high.- Mix-Precision(FP16) training by
torch.cuda.ampwould make train lossnan, that's make me could only train with 64 batch_size. - Val loss is strange.
- Convergence is much slower than
Resnet50.
@PistonY Awesome, thanks for sharing! Let's keep this issue open so others can build upon it
@lucidrains Unfortunately, I only got 76.1 best top1 on val set(79.2 on train set). I'd better wait author release their code.
@PistonY Good to know Devin, thanks for sharing!
Hi, @lucidrains
I also build lambdaresnet50 using lambdalayer. And I encountered a situation that my loss is like PistonY's. the order is large.
And I found that the results after lambdalayer were getting larger and larger and became nan.
here is my model code snippet for forward (just modified the torchvision resnet and replace the conv2 in Bottleneck class to lambdalayer)
the result
Could you give me some advices about this?
@PistonY no I don't believe so, but feel free to correct me if I'm wrong
If I understand your implementation correctly the positional Lambda interaction is purely linear, which means it should be followed by an activation and batchnorm? This is what I tried at least and its working normally.