nlstm
nlstm copied to clipboard
Reduction in memory requirements: Add SplitInitializer for separate initialization
This dramatically reduces memory requirements, as there will no longer be kept an extra copy of the concatenated weight tensor for each timestep (During backprop)
Hi @marhlder, would you mind explaining a bit more how this works? Just from the code, I do not quite understand how this would reduce the memory requirement. Since in the original code, the kernels are also concatenated. Specifically, why does memory consumption relate to the initializer? My understanding is that using dynamic_rnn will prevent the copying from happening.
@hannw Thx for your response. The problem is that the default backpropagation code in TensorFlow will save a copy of the concatenated weight tensor (Kernel) for each timestep (In the original code), as the concatenation op will become a part of the graph and run for each timestep. You are correct that dynamic RNN wont make extra copies of the individual kernels, only the concatenated results. The concatenation will only happen once when using the provided custom initializer, which in turn will no longer require the backpropagation code to keep this extra intermediate value for each timestep. This is not noticeable for networks with few units in each layer and short sequences, but it does become very noticeable once you turn up the heat, e.g. sequences of length 200+, nesting level 3, and 512 units in each layer. Try for instance to compare the memory consumption of the original implementation, nesting level of 3, with 3 layers of regular stacked LSTM.