Rene Bidart comments

Repositories
Issues
Comments

Results 4 comments of


                                            Rene Bidart

Pathfinder task

Is anyone able to reproduce the paper's results using performer on pathfinder? Accuracy is much worse (62% vs. 77%). I was able to approximately reproduce with transformer and bigbird.

ListOps performance

I found either lowering the learning rate or increasing the batch size was useful for this task. I think their hyperparameters are for a large effective batch size because they...

MobilenetV1 difference from original paper

CIFAR images are only 32x32 (compared to imagenet 224x224), so you need to reduce the stride of the first few layers, or else they will perform poorly.

MobileNet_v2 takes four time the GPU memory of the native pytorch implementation mobilenet_v2!

This is because the stride is reduced in this model, so the feature maps are smaller, so activations take up less memory.