hyperas icon indicating copy to clipboard operation
hyperas copied to clipboard

Memory explosion

Open haoyangz opened this issue 8 years ago • 9 comments

I was tuning a mid-size network. I observed that the memory usage kept rising and with enough trials (around 50) eventually it caused the server to kill it for taking too much memory. As I remember in the README that the data was loaded only once, I wonder if it is because hyperas saves the parameters (i mean actual weights) of all the trials? If so, I feel it makes more sense to release them since we only care about hyper-parameters.

haoyangz avatar Apr 20 '16 14:04 haoyangz

I know what you mean, maybe I should update the readme accordingly. What you want is in fact already possible, namely instead of returning this return {'loss': -acc, 'status': STATUS_OK, 'model': model} in model just drop the model, i.e. use return {'loss': -acc, 'status': STATUS_OK}

Many examples feature the former, since it was requested by many people. My view is that you should pick hyperparameters first and then do a proper training run once more, so I'm personally not that interested in the former myself.

maxpumperla avatar Apr 20 '16 15:04 maxpumperla

I totally agree with you! Thanks and I am closing it.

haoyangz avatar Apr 20 '16 15:04 haoyangz

But if I drop the model, all I have at the end of the search is best_run, which is a dictionary from hyper-param names to its best value. Is there a convenient way to use them to construct a model (not trained), or I have to manually replace them?

haoyangz avatar Apr 20 '16 15:04 haoyangz

That is the money question! I can add that fairly quickly. For hyperopt users that one is easy, but hyperas hides the internals, so there needs to be a convenience wrapper.

P.s.: Note that above instead of leaving model out, you can add anything you want and it will show up in trials.

maxpumperla avatar Apr 20 '16 16:04 maxpumperla

Thank you for the P.S. In that spirit, one workaround is:

In model()

### Save the optimizer as it is not part of the architecture
myoptimizer = Adadelta(lr=1.0, rho={{choice([0.95,0.99])}}, epsilon={{choice([1e-04,1e-05,1e-06])}})
model.compile(loss='binary_crossentropy', optimizer=myoptimizer)

### Training as usual
model.fit(....)

### Save the architecture
json_string = model.to_json()

### Return the optimizer and the architecture
return {'loss': -acc, 'status': STATUS_OK,'model':(json_string,myoptimizer)}

Then the second parameter returned from optim.minimize() will be the (json string,myoptimizer), from which you can create a new model.

best_run,best_params = optim.minimize(....)
best_archit, best_optim = best_params

### Construct new model with the best architecture and optimizer
newmodel = model_from_json(best_archit)
newmodel.compile(loss=XXX, optimizer=best_optim)

haoyangz avatar Apr 20 '16 19:04 haoyangz

@maxpumperla I've done, what you've written above: I dropped the model, in line:

return {'loss': -acc, 'status': STATUS_OK}

but behavior is the same: memory keeps growing with each hyperas evaluation, until it is killed by operating system. I even tried to free memory manually inside model function:

K.clear_session
gc.collect()

But problem remains. Are you sure, that just dropping model in line return {'loss': -acc, 'status': STATUS_OK} should free memory?

pciazynski avatar Oct 08 '17 10:10 pciazynski

It's my mistake, I've just omitted parenthesis in K.clear_session. Now, after correction, when I have

K.clear_session()

inside create_model function, problem with memory disappears.

But, unfortunately just dropping model inside return {'loss': -acc, 'status': STATUS_OK} line it's not enough.

pciazynski avatar Oct 08 '17 11:10 pciazynski

Thanks a lot @Ethefor , that also solved my problem! Great help!

ecemlago avatar Aug 13 '18 12:08 ecemlago

None of this really worked for me, but once I reduced the batch sizes, completely fixed the problem. :-)

https://stackoverflow.com/questions/42495930/tensorflow-oom-on-gpu

yeamusic21 avatar Dec 01 '18 05:12 yeamusic21