theanets
theanets copied to clipboard
Layerwise GPU memory use
Hi, I have a feeling that layerwise optimizer, by creating numerous networks is not freeing past networks and using more GPU memory than it should. I'm having a heck of time doing layerwise training
With this network:
inputs = 4096*2
win_size = 2048
swin_size = win_size / 2 + 1
output_size = swin_size
hidlayersize = win_size
exp = theanets.Experiment(theanets.Regressor,layers=[inputs, inputs, inputs/2, inputs/3, inputs/4, output_size, output_size])
With the following pretraining:
logging.info("Pretraining")
net.train([ttrain[0:1*trains/4], toutputs[0:1*trains/4]],
[vtrain[0:1*trains/4], voutputs[0:1*trains/4]],
algo='layerwise',
learning_rate=1e-3,
save_every=25,
batch_size=32, # this is small!
patience = 6,
min_improvement = 0.1,
save_progress="current_pre_brain.pkl",
momentum=0.9)
I get the following error after training on layer hid1 and hid2 once it tries to train on hid3 it borks at validation.
I 2015-09-08 12:26:42 downhill.base:402 patience elapsed!
I 2015-09-08 12:26:42 theanets.layers.base:303 layer Feedforward "lwout": (hid3:out)2730 ->
1025, linear, 2799275 parameters
I 2015-09-08 12:26:42 theanets.trainer:250 layerwise: training in -> hid1 -> hid2 -> hid3 ->
lwout
I 2015-09-08 12:26:43 downhill.base:378 -- patience = 6
I 2015-09-08 12:26:43 downhill.base:379 -- validate_every = 10
I 2015-09-08 12:26:43 downhill.base:380 -- min_improvement = 0.1
I 2015-09-08 12:26:43 downhill.base:381 -- max_gradient_norm = 0
I 2015-09-08 12:26:43 downhill.base:382 -- max_gradient_elem = 0
I 2015-09-08 12:26:43 downhill.base:383 -- learning_rate = 0.001
I 2015-09-08 12:26:43 downhill.base:384 -- momentum = 0.9
I 2015-09-08 12:26:43 downhill.base:385 -- nesterov = False
I 2015-09-08 12:26:43 downhill.adaptive:220 -- rms_halflife = 14
I 2015-09-08 12:26:43 downhill.adaptive:221 -- rms_regularizer = 1e-08
I 2015-09-08 12:26:43 downhill.base:112 compiling evaluation function
I 2015-09-08 12:26:43 downhill.base:118 compiling RMSProp function
Error allocating 11193000 bytes of device memory (out of memory). Driver report 966656 bytes
free and 4294246400 bytes total
Traceback (most recent call last):
File "stft-theanet.py", line 62, in <module>
momentum=0.9)
File "build/bdist.linux-x86_64/egg/theanets/graph.py", line 400, in train
File "build/bdist.linux-x86_64/egg/theanets/graph.py", line 376, in itertrain
File "build/bdist.linux-x86_64/egg/theanets/trainer.py", line 253, in itertrain
File "build/bdist.linux-x86_64/egg/theanets/trainer.py", line 66, in itertrain
File "/usr/local/lib/python2.7/dist-packages/downhill/base.py", line 388, in iterate
self._compile()
File "/usr/local/lib/python2.7/dist-packages/downhill/base.py", line 119, in _compile
updates = list(self._updates) + list(self._get_updates())
File "/usr/local/lib/python2.7/dist-packages/downhill/base.py", line 134, in _get_updates
for var, expr in self._get_updates_for(param, grad):
File "/usr/local/lib/python2.7/dist-packages/downhill/adaptive.py", line 226, in _get_upda
tes_for
g2_tm1 = shared_like(param, 'g2_ewma')
File "/usr/local/lib/python2.7/dist-packages/downhill/util.py", line 45, in shared_like
File "/usr/local/lib/python2.7/dist-packages/theano/compile/sharedvalue.py", line 208, in
shared
allow_downcast=allow_downcast, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/var.py", line 203, in flo
at32_shared_constructor
deviceval = type_support_filter(value, type.broadcastable, False, None)
MemoryError: ('Error allocating 11193000 bytes of device memory (out of memory).', "you migh
t consider using 'theano.shared(..., borrow=True)'")
Yet if I just do training it works fine. It does use a lot of GPU memory, it's a big network and I have a lot of training examples.
batch_size = 4096 # way bigger!
logging.info("Finetune Training")
net.train([ttrain, toutputs],
[vtrain, voutputs],
algo='rmsprop',
learning_rate=1e-4,
save_every=25,
batch_size=batch_size,
patience = 100,
min_improvement = 0.001,
save_progress="current_brain.pkl",
momentum=0.9)
My theory is that shared variables and whatnot are not being freed appropriately. I was looking at the code and new layers are being created but I cannot tell how much sharing or copying is being done.
Yes, I wouldn't be surprised, theanets doesn't try to do any memory management at all, so it's up to Python/Theano to clean up things that have disappeared from the active set. There's probably a bunch that could be done within theanets to help with this, though.