pydeeplearn Created new file with an example of hyperparameter optimization using hyperopt package

Hi!

In this fork I created an example of using the library Hyperopt for Hyperparameter optimization of a Deep Belief Network. In this example the MNIST database is used, and it optimizes the following parameters: number of hidden layers, size of layers, unsupervised learning rate, supervised learning rate, maximum momentum, and visible dropout rate.

Currently two algorithms are implemented in hyperopt: Random Search and Tree of Parzen Estimators (TPE - which is implemented in the example). More information about the hyperopt in http://hyperopt.github.io/hyperopt/.

I chose TPE over others methods (like Spearmint and SMAC) because this was the only library that I knew. However, searching the literature I found a study in which these algorithms are compared (http://www.cs.ubc.ca/~kevinlb/papers/2013-BayesOpt-Hyperparameters.pdf). In this study it is shown that the TPE algorithm has a better performance in some cases.

PS: Congratulations for the excellent library and report. PS2: Sorry for the last pull request, I'm still newbie on github. Please ignore and close the old pull request.

Oct 10 '14 14:10 Warvito

Hey.

Firstly, thanks for explaining the choice of library. Since it is better than spearmint in some cases it is good to have as an option.

Secondly, I just pulled your branch and I am trying to run the code. I am getting an error:

proabilities.shape (2000, 10) testLabels.shape (2000, 10) Traceback (most recent call last): File "hyperopt_exampleMNIST.py", line 173, in main() File "hyperopt_exampleMNIST.py", line 162, in main trials=trials) File "/usr/local/lib/python2.7/dist-packages/hyperopt/fmin.py", line 334, in fmin rval.exhaust() File "/usr/local/lib/python2.7/dist-packages/hyperopt/fmin.py", line 294, in exhaust self.run(self.max_evals - n_done, block_until_done=self.async) File "/usr/local/lib/python2.7/dist-packages/hyperopt/fmin.py", line 268, in run self.serial_evaluate() File "/usr/local/lib/python2.7/dist-packages/hyperopt/fmin.py", line 187, in serial_evaluate result = self.domain.evaluate(spec, ctrl) File "/usr/local/lib/python2.7/dist-packages/hyperopt/fmin.py", line 114, in evaluate rval = self.fn(pyll_rval) File "hyperopt_exampleMNIST.py", line 105, in objective UACROC.append(roc_auc_score(testLabels, proabilities)) File "/usr/lib/python2.7/dist-packages/sklearn/metrics/metrics.py", line 403, in roc_auc_score fpr, tpr, tresholds = roc_curve(y_true, y_score) File "/usr/lib/python2.7/dist-packages/sklearn/metrics/metrics.py", line 672, in roc_curve fps, tps, thresholds = _binary_clf_curve(y_true, y_score, pos_label) File "/usr/lib/python2.7/dist-packages/sklearn/metrics/metrics.py", line 505, in _binary_clf_curve y_true = column_or_1d(y_true) File "/usr/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 265, in column_or_1d raise ValueError("bad input shape {0}".format(shape)) ValueError: bad input shape (2000, 10)

Can you please run the script now and check if this is also happening to you?

If not, can you please tell me what version of sklearn are you using?

Oct 10 '14 15:10 mihaelacr

I ran the script again and I didn't had this error. I am using scikit-learn (0.15.2)

Oct 10 '14 17:10 Warvito

Ok so with updating sklearn I did manage to get the script running. It finished, but then I get this:

{'visibleDropout': 0.1848519247637862, 'unsupervisedLearningRate': 0.009591486934627278, 'layerSizes': 0, 'nrLayers': 0, 'momentumMax': 0.7746038441248677, 'supervisedLearningRate': 0.0019464988278357644}

As you can see, layerSizes and nrLayers cannot be 0. Can you please check it out? The code crashes if nrLayers is 0 or if layerSizes is not a list (admittedly not in a nice way, but it does. I will give a nice error message tomorrow morning).

Oct 10 '14 21:10 mihaelacr

Sorry, I forgot to comment this in the code. Actually, this value is the index of the vectors declared in space.

space = ( hp.choice('nrLayers', [2,3]), hp.choice('layerSizes', [500,750,1000,1250]), hp.loguniform( 'unsupervisedLearningRate', log( 1e-5 ), log( 1e-1)), hp.loguniform( 'supervisedLearningRate', log( 1e-5 ), log( 1e-1 )), hp.uniform( 'momentumMax', 0.5, 0.99 ), hp.uniform( 'visibleDropout', 0.1, 0.9 ) )

This only occurs when the "hp.choice" is used to define the variable space. In your case, the best hyperparameter (which generates the highest average AUC-ROC score between the 5-fold cross-validation) would be with nrLayers = 2 and layerSizes = 500.

Oct 10 '14 23:10 Warvito

Ok I get it now. Can you please fix it so that things are even more clear?

I also think that a dropout rate of 0.1848519247637862 is a bit off for the visible layer. In that case you drop almost all input, so what is the network learning?

You also have some problems with whitespace, please enable 'remove trailing whitespace on save' on your editor and ensure that all equal signs are surrounded by one whitespace to the left and one to the right. This however should not happen around keyword arguments (you have this now). Please do not put spaces in brackets around arguments like you have in 'objective( x )' or 'print "{}\n".format( x )'. Also look at the variable casing that I use. I use camel case for variables (so it is runCounter not run_counter). Always use space after comma in lists and tuples as well as function arguments. I know these things might seem annoying at this point but it is very important to keep consistency inside a coding project (especially of this size and if it includes multiple programmers).

Some other things: no need to import math for log, numpy has it and it avoids one import + adds consistency. As you can see in my code I never import functions themselves from numpy, I just use import numpy as np and use functions from there. Consistency there is also helpful for users and for coders. Especially because this is a script that tries to familiarize people with hyperopt, do not import functions or variables unqualified from it. The reader gets confused (where does fmin come from? did we define it in this file or is it from a library?) so try to keep things as clear as possible.

Never print things without saying that they are. Currently the script ends with this print: Best Parameters

{'visibleDropout': 0.1848519247637862, 'unsupervisedLearningRate': 0.009591486934627278, 'layerSizes': 0, 'nrLayers': 0, 'momentumMax': 0.7746038441248677, 'supervisedLearningRate': 0.0019464988278357644}

{'status': 'ok', 'loss': 0.030216999337328354, 'classifier_precision': {'type': <type 'float'>, 'value': 0.80654731946181246}, 'classifier_recall': {'type': <type 'float'>, 'value': 0.77065269408931203}, 'classifier_fscore': {'type': <type 'float'>, 'value': 0.75614674732420506}}

{'status': 'ok', 'loss': 0.02080707649439817, 'classifier_precision': {'type': <type 'float'>, 'value': 0.83300569148855175}, 'classifier_recall': {'type': <type 'float'>, 'value': 0.81651875369377647}, 'classifier_fscore': {'type': <type 'float'>, 'value': 0.81586420681908911}}

[0.030216999337328354, 0.02080707649439817]

You print what the best parameters are (also remove the new line there), but the rest are just printed variables without saying what they are. When you send a pull request to someone usually it is your duty to ensure that it will be easy for them to run the script and understand what is going on. I can also imagine that a potential user that just want to figure out how to use hyperopt with pydeeplearn will be confused about this and it will take extra time for them to understand what they are.

Oct 11 '14 09:10 mihaelacr

Update on the small dropout: I let the algorithm run a bit more (by setting max_eval=10) and I get the following hyperparameters:

Best parameters {'visibleDropout': 0.7247231190339729, 'unsupervisedLearningRate': 0.00033464024326233555, 'layerSizes': 2, 'nrLayers': 1, 'momentumMax': 0.6566265323965443, 'supervisedLearningRate': 0.020632438859459007}

This actually looks reasonable, and I will try it myself t see what results I get. Will write the updates here.

This means that we should make it more clear in the code how important max_eval is and that it can affect accuracy of the obtained hyperparameters.

Here is a gist where you can find my version of your code. I changed the code to match some of the standards I described before: https://gist.github.com/mihaelacr/a15f110640fae9a40047 Maybe this will make things easier for you.

Did you try this code with theano running on the GPU? I see no reason for something to break but it's worth the try.

Oct 11 '14 10:10 mihaelacr

Okay. I will follow the pattern in the forthcoming changes. I tried to make the code more clear with some comments: https://gist.github.com/Warvito/be4d588c3ff9410455ac . Yes, I already ran the code on the GPU and did not find any problem.

Oct 11 '14 19:10 Warvito

I updated my gist with some small changes (including fixing the index print confusion that we discussed of before).

Also I updated the number of layers to [4,5] instead of [2,3]. Because of the loop

for i in range(0, nrLayers - 2): hiddenLayers.append(hiddenNeuronsPerLayer)

if you set nrLayers to 2 then you actually not add any hidden layers. I think this should be more clear now because I added a print when you instantiate a deep belief network that shows the number of layers and the architecture.

Please update your code with the (small) changes in my gist and then we are good to go.

All I need to do is actually train a network with some example hyperparameters obtained by running your code and check if we get reasonable results. I would have done this by now but the GPUs in my university are unavailable for now.

Oct 13 '14 09:10 mihaelacr

I just ran the experiments for 10 epochs training, and now I will do so for 100 epochs training to get better results. After this I will run the training / testing on the entire data and I am very much looking forward to see the results.

Oct 13 '14 11:10 mihaelacr

Did you get a chance to update the code?

I ran some experiments and I seem to not be able to get less than 2% error with the optimized hyperparameters. Even with the use of Sigmoid as an activation function that seems a bit low (especially given dropout). Did you get any better results?

Oct 17 '14 13:10 mihaelacr

Hello. I have not had chance to check the performance in the MNIST database. However, using the data of my research I found an improvement by changing the space of learning rates. I started to use a uniform distribution (from 0.01 to 0.0001) to define the space of learning rates.

Another option would be to use a normal distribution which is used by Bergstra in https://github.com/hyperopt/hyperopt-nnet/blob/766811d1128cda02d41c95ae118fd938e023c605/hpnnet/nips2011_dbn.py (line 111).

In addition, the configuration that achieved 1.1% error in your report is the one defined in deepbeliefMNISTGaussian()? Soon I will run the optimization for MNIST database with the Gaussian and Rectified function activation.

Regards, Walter

Oct 27 '14 17:10 Warvito

Thanks for explaining me the possible options we could use for hyperparameter search and backing it up with examples!

Yes, the code used is deepbeliefMNISTGaussian(). The configuration in the code might have changed (for other experiments), but my results are completely reproducible by looking at the report. I made sure that you have everything there. If you look at page 53, you can see the details of training with momentum (note that I used the (1- momentum) learning rate factor) and the rest is in the table of page 55.

Oct 29 '14 10:10 mihaelacr

pydeeplearn pydeeplearn copied to clipboard

Created new file with an example of hyperparameter optimization using hyperopt package

pydeeplearn
pydeeplearn copied to clipboard