lambda-networks icon indicating copy to clipboard operation
lambda-networks copied to clipboard

Does LambdaLayer need BatchNorm and activation after it?

Open PistonY opened this issue 5 years ago • 7 comments

Hello, I'm trying to reproduce of this. I'm build the LambdaResnet,a little question is BatchNorm and activation are needed after this(LambdaLayer)? Thanks.

PistonY avatar Oct 20 '20 06:10 PistonY

@PistonY no I don't believe so, but feel free to correct me if I'm wrong

lucidrains avatar Oct 20 '20 21:10 lucidrains

Hi @lucidrains, thanks for reply. I tested them all, I think you're right. When apply bn+relu the val accuracy doesn't grow. This is my final implement.

Now I'm training LambdaResnet50,it's looking good: I use the standard training step in my project same with Resnet50 except batch_size set to 64.

Epoch 28 result: Train Acc:0.44 Loss:2.54 Val Acc:0.48 Loss:3.1748e+08 Some observations are:

  1. parameters and GFLOPs are small but training speed and gpu memory cost are still high.
  2. Mix-Precision(FP16) training by torch.cuda.amp would make train loss nan, that's make me could only train with 64 batch_size.
  3. Val loss is strange.
  4. Convergence is much slower than Resnet50.

PistonY avatar Oct 21 '20 07:10 PistonY

@PistonY Awesome, thanks for sharing! Let's keep this issue open so others can build upon it

lucidrains avatar Oct 21 '20 19:10 lucidrains

@lucidrains Unfortunately, I only got 76.1 best top1 on val set(79.2 on train set). I'd better wait author release their code.

PistonY avatar Oct 26 '20 01:10 PistonY

@PistonY Good to know Devin, thanks for sharing!

lucidrains avatar Oct 26 '20 19:10 lucidrains

Hi, @lucidrains

I also build lambdaresnet50 using lambdalayer. And I encountered a situation that my loss is like PistonY's. the order is large.

And I found that the results after lambdalayer were getting larger and larger and became nan.

here is my model code snippet for forward (just modified the torchvision resnet and replace the conv2 in Bottleneck class to lambdalayer)

Screen Shot 2020-11-02 at 3 23 21 PM

the result

Screen Shot 2020-11-02 at 3 24 02 PM

Could you give me some advices about this?

Kirayue avatar Nov 02 '20 07:11 Kirayue

@PistonY no I don't believe so, but feel free to correct me if I'm wrong

If I understand your implementation correctly the positional Lambda interaction is purely linear, which means it should be followed by an activation and batchnorm? This is what I tried at least and its working normally.

zengyi-li avatar Nov 08 '20 16:11 zengyi-li