torch-residual-networks
torch-residual-networks copied to clipboard
Loss function explode under default settings
Hi all,
I'm training on CIFAR-10 following the instructions but I found that in most cases the loss function explodes during the first few iterations.
This is how it behaved when it luckily didn't explode. I don't know whether this is the case for others, but maybe the initial learning rate 0.1 is too large?
12.243277549744
314.09014892578.............................................. 128/50000 ................] ETA: 0ms | Step: 0ms
684.17578125................................................... 256/50000 ..............] ETA: 1m26s | Step: 1ms
1731.8382568359................................................ 384/50000 ..............] ETA: 1m40s | Step: 2ms
1436.0552978516................................................ 512/50000 ..............] ETA: 1m42s | Step: 2ms
1810.4338378906................................................ 640/50000 ..............] ETA: 1m44s | Step: 2ms
2016.3845214844................................................ 768/50000 ..............] ETA: 1m42s | Step: 2ms
1415.6356201172................................................ 896/50000 ..............] ETA: 1m43s | Step: 2ms
980.73388671875................................................ 1024/50000 .............] ETA: 1m45s | Step: 2ms
404.52484130859................................................ 1152/50000 .............] ETA: 1m44s | Step: 2ms
235.41812133789................................................ 1280/50000 .............] ETA: 1m45s | Step: 2ms
162.14950561523................................................ 1408/50000 .............] ETA: 1m43s | Step: 2ms
203.14471435547................................................ 1536/50000 .............] ETA: 1m43s | Step: 2ms
157.7633972168................................................. 1664/50000 .............] ETA: 1m43s | Step: 2ms
153.45094299316................................................ 1792/50000 .............] ETA: 1m42s | Step: 2ms
127.98012542725................................................ 1920/50000 .............] ETA: 1m42s | Step: 2ms
81.274276733398................................................ 2048/50000 .............] ETA: 1m42s | Step: 2ms
52.629417419434................................................ 2176/50000 .............] ETA: 1m42s | Step: 2ms
28.258670806885................................................ 2304/50000 .............] ETA: 1m42s | Step: 2ms
12.342067718506................................................ 2432/50000 .............] ETA: 1m42s | Step: 2ms
6.292441368103................................................. 2560/50000 .............] ETA: 1m42s | Step: 2ms
3.0711505413055................................................ 2688/50000 .............] ETA: 1m42s | Step: 2ms
2.4665925502777................................................ 2816/50000 .............] ETA: 1m42s | Step: 2ms
2.3633861541748................................................ 2944/50000 .............] ETA: 1m42s | Step: 2ms
2.3024611473083................................................ 3072/50000 .............] ETA: 1m41s | Step: 2ms
2.3726959228516................................................ 3200/50000 .............] ETA: 1m41s | Step: 2ms
2.3351118564606................................................ 3328/50000 .............] ETA: 1m41s | Step: 2ms
2.3633522987366................................................ 3456/50000 .............] ETA: 1m41s | Step: 2ms
2.3602793216705................................................ 3584/50000 .............] ETA: 1m41s | Step: 2ms
2.3885579109192................................................ 3712/50000 .............] ETA: 1m40s | Step: 2ms
2.3737788200378................................................ 3840/50000 .............] ETA: 1m40s | Step: 2ms
got the same issue
hm, interesting. You may need to mess with the learning rate; it certainly isn't supposed to explode that first time. It's normal for loss to increase a little bit (from 2 to 3 or so), but it shouldn't explode. (Using RMSprop for example causes loss to explode)
I posted my loss logs on the table on the front page if you're interested. Here's an example for the Nsize=3 (20-layer) network that eventually gets 0.0829 error: https://mjw-xi8mledcnyry.s3.amazonaws.com/experiments/201601141709-AnY56THQt7/Training%20loss.csv
problem solved. because some conv and BN layers are not initialized. (train-cifar.lua and residual-layers.lua). I spent a whole day debugging why the loss does not decrease. (the net just randomly guesses and top1 = .1 following all epoches.) Finally target the issue. sorry. rookie to Torch.
Oops, sorry! Glad you found the issue. Should we add some initialization code to keep others from being bitten? When I ran the experiments in January, they worked; I wonder if torch's default initialization changed since then requiring your code to be more explicit about it.