Merge container and BiRNN layers memory errors
Hello, I have been trying to use MergeBroadcast and BiRNN layers and faced some issues.
- BiRNN implementation does not expect layer to be the first layer in the network. So, the code like:
layers = [ BiRNN(1, init=Xavier(), activation=Logistic(shortcut=True)) ]
model, cost = [Model(layers=layers), GeneralizedCost(costfunc=CrossEntropyMulti()))]
Fails during initialization with the following error:
File "<...>/neon/models/model.py", line 175, in fit
self.initialize(dataset, cost)
File "<...>/neon/models/model.py", line 130, in initialize
self.layers.allocate_deltas()
File "<...>/neon/layers/container.py", line 369, in allocate_deltas
self.set_deltas(self.global_deltas)
File "<...>/neon/layers/container.py", line 245, in set_deltas
l.set_deltas(global_deltas)
File "<...>/neon/layers/recurrent.py", line 1154, in set_deltas
self.out_deltas_buffer_f_v = self.out_deltas_buffer_f.reshape(nin, -1)
AttributeError: 'NoneType' object has no attribute 'reshape'
- BiRNN layer allocation implementation does wrong allocation.
More accurate, the
allocatemethod might be called with ashared_outputsparameter, containing preallocated buffer for outputs (of shapeout_shape), butallocatemethod does allocation of the internal_buffer (of shapehidden_shape) there, which is larger than output. Thus, the code using preallocation for this layer fails.
self.out_shape = (2 * self.nout, self.nsteps)
self.hidden_shape = (2 * self.nout, self.nsteps + 2)
Quick fix is to allocate buffers for outputs (in the passed memory) and for internals (in another memory), but this doubles memory footprint. This also requires copying to output buffer in fprop.
-
Sequentialcontainer does extra allocation call /MergeBroadcastintegration. In theallocatemethod ofsequentialcontainer extraallocatecall might occur for the last layer, that owning its outputs (see else branches). The code is callingallocatefor this last layer and for all layers. Moreover, the second call does not includeshared memoryparameter, which might be passed to the sequential container. In simple cases this might lead to increased memory consumption, leaks, and the last layer outputs allocation override (this might be argued by this lines, but this is a shaky argument because of method overriding possibility). Now, consider the case ofMergeBroadcastclass, which is consists of a list ofSequentialcontainers and a sharedoutput buffer. The issue leads to overriding of this output buffer in the last layer (ignoring shared buffer). Thus, theBroadcastbranch outputs are not merged as they are not written to the shared buffer. This becomes a problem in the case, where BiRNN layer is the last outputs owning layer in the branch. This layer overrides allocation method, so the repeated allocation occurs, and the second allocation will not be aware of shared memory. -
MergeBroadcastandBiRNNinteraction duringbprop. Consider the following model:
# input = (3, 128, 128), batch size = 128
layers = [
Sequential([
Conv(fshape=(1, 1, 128), padding=0, strides=64, dilation=1, init=Xavier()),
BiRNN(2, init=Xavier(), activation=Tanh()),
# Reshape((2 * 2, -1)) # - fix to the issue
]),
Sequential([
Conv(fshape=(1, 1, 128), padding=0, strides=64, dilation=1, init=Xavier()),
BiRNN(2, init=Xavier(), activation=Tanh()), # init=Constant(0) - broken
# Reshape((2 * 2, -1)) # - fix to the issue
])
]
layers = [
MergeBroadcast(layers, 'stack'),
Affine(nout=2, init=Constant(0), activation=Logistic(shortcut=True))
]
Trying to run this will give the error, saying that something is wrong with the error shape:
File "<...>/neon/layers/container.py", line 920, in bprop
self.deltas, self.out_shape, alpha, beta, self.alphas, self.betas)
File "<...>/neon/backends/nervanagpu.py", line 3248, in bprop_mergebroadcast
l.bprop(e, alpha=a * alpha, beta=b)
File "<...>/neon/layers/container.py", line 427, in bprop
error = l.bprop(error)
File "<...>/neon/layers/recurrent.py", line 1336, in bprop
self.activation, True)
File "<...>/neon/backends/nervanagpu.py", line 2831, in compound_rnn_unroll_bprop
in_deltas[:] = activation.bprop(hs) * in_deltas
File "<...>/neon/backends/nervanagpu.py", line 190, in __setitem__
self.__getitem__(index)._assign(value)
File "<...>/neon/backends/nervanagpu.py", line 373, in _assign
OpTreeNode.build("assign", self, value)
File "<...>/neon/backends/backend.py", line 1843, in build
return node.execute()
File "<...>/neon/backends/backend.py", line 1864, in execute
return backend.execute(self)
File "<...>/neon/backends/nervanagpu.py", line 1269, in execute
return call_compound_kernel(self._get_rand_state_dev(), self.compute_capability, *stack)
File "<...>/neon/backends/float_ew.py", line 823, in call_compound_kernel
"Input shape:%s not compatible" % (shape,))
TypeError: Input shape:[2, 128] not compatible
I thought this is because of propagated error shape mismatched to the expected by BiRNN bprop method. The addition of Reshape layer, which is in the correct implementation should restore and propagate back its input shape, solved the issue. Nevertheless, I think the MergeBroadcast layer should restore output shapes for branches in the back propagation pass.
Environment: python 3.5.2, neon 2.6.0 (f9d771b), cuda 8.0, gpu K40s.
@zhiltsov-max Thanks for your findings and recommendations. Appreciated!