wide-residual-networks icon indicating copy to clipboard operation
wide-residual-networks copied to clipboard

General hyperparameters?

Open zizhaozhang opened this issue 7 years ago • 7 comments

Hi,

My question seems a bit unrelated. But I am really curious, so sorry for interrupt,

WRN uses a quite different weightDecay and learning rate schedule scheme from fb-resnet-torch used. As the WRN paper mentions, the accuracy of pre-act-Resnet trained with WRN learning scheme gets better results. So I think this hyperparameter setting is quite good and generalized.

Recently, I am using WRN code to train a new method, DENSELY CONNECTED CONVOLUTIONAL NETWORKS (DenseNet) http://arxiv.org/pdf/1608.06993v1.pdf, but the error is larger than the accuracy trained using fb-resnet-torch code (5.2 vs. 4.1 (original paper reported)).

I understand the hyperparameters may vary from model to model. But by so many tests of WRN, I think this setting should not have obtained more than 1.0 increased error rate. WRN paper does not discuss much about how they select the hyperparameters.

I am not sure if you are familiar with this new method (DenseNet), could you comment on this situation? In addition, could you provide more details about how you select the hyperparameters instead of using fb-resnet-torch code's settings. It will be very helpful for us to train some modified architecture based on WRN to delve better hyperparameter settings.

Thanks a lot!

zizhaozhang avatar Sep 08 '16 22:09 zizhaozhang

@zizhaozhang that's probably related to https://github.com/szagoruyko/wide-residual-networks/issues/17, DenseNet doesn't use whitened data

szagoruyko avatar Sep 26 '16 16:09 szagoruyko

OMFG! Thanks for that paper, it's totally the same concept as I've recently figured out and had been testing. NNs grow up by days, not years right now.

ibmua avatar Sep 26 '16 17:09 ibmua

Yes, as far as I see from their pictures, it's nearly identical to my hoardNet, so you might as well just use my code, maybe, modify it a little bit.

https://github.com/ibmua/Breaking-Cifar

Check out the "hoard" models. Major parameters there are "sequences" and "depth", though, I think "2-x"+ models I've designed to be run with depth=2. Earlier models are more generic. But the thing should be easily tweakable, has a clean code. Mind that it has 4-space tabs. Though, I haven't read the article yet, I'm guessing that some details may be slightly different, though. Also mind, that you want to clone the whole thing - just importing a model into Sergey's WRN won't work.

BTW, hoard-2-x is that model that I referred to as possibly being comparable to WRN in terms of performance/parameters. IMHO HoardNet sounds like a more meaningful name for this thing =D The info is being hoarded without discarding it like in usual architectures. Accumulation would be a less reasonable way to call this, because it sounds more like something ResNets do. DenseNet doesn't seem too reasonable of a name to me.

https://github.com/ibmua/Breaking-Cifar/blob/master/logs/load_59251794/log.txt log from near-end of that training where I had 19.5% on my [0..1]-scaled Cifar-100+

ibmua avatar Sep 26 '16 17:09 ibmua

Their code is also available. At https://github.com/liuzhuang13/DenseNet/blob/master/densenet.lua . They are using preactivation, which is different from what I've used. I thought that that may be beneficial, but that needed more testing and I don't have too many resources. =) That's a lot more expensive, though. But the difference may be well worth it. Some other things I've designed differently may actually work better, I think. I think I took a bit from Inception & InceptionResNet when they've only took from ResNet.

ibmua avatar Sep 26 '16 18:09 ibmua

@szagoruyko I see. I will have a try of that. It is really trick. Thanks @ibmua for observing that. I have tried so many different tests using WRN code to train DenseNet, ignoring this part.

A little different is that @ibmua uses [0,1] scaled data and DenseNet uses mean&std as fb-resnet.torch does. Which do you think is better?

zizhaozhang avatar Sep 29 '16 14:09 zizhaozhang

Mean+std is likely a little bit better.

ibmua avatar Sep 29 '16 15:09 ibmua

Cool. I will check your code.

zizhaozhang avatar Sep 29 '16 18:09 zizhaozhang