char-rnn
char-rnn copied to clipboard
Memory Issue in Scaling sequence length
It seems there is a weird issue where you can max out the GPU memory when you scale the length of the sequence. It happens during the clone stage. And for some reason, this doesn't happen with similar models in the Wojciech codebase, even though he seems to do the same thing (cloning the net based on the sequence length).
I wonder if any lua ninjas out there have ideas on the differences in how the different clone methods are implemented
(PS. this code is amazing...really awesome stuff)
UPDATE: Actually, I think there is similar behavior in the Wojciech codebase, but it happens at a higher sequence length.
With both of these, I wonder if there is some way in the code to share the memory of the parameters that are fixed to be equivalent instead of creating redundant copies.
Second that question. Running out of GPU memory e.g. with sequence length 217.
UPDATE: I think I verified that the parameters are indeed shared between the clones. To be exact, the clones point to the prototype. If you use this snippet, you will see that changing the prototype parameters affects the clone parameters:
if net.parameters then
local cloneParams, cloneGradParams = clone:parameters()
local cloneParamsNoGrad
for i = 1, #params do
cloneParams[i]:set(params[i])
cloneGradParams[i]:set(gradParams[i])
end
if paramsNoGrad then
cloneParamsNoGrad = clone:parametersNoGrad()
for i =1,#paramsNoGrad do
cloneParamsNoGrad[i]:set(paramsNoGrad[i])
end
end
params[1][1][1] = 0.12345
io.write(params[1][1][1])
io.write(cloneParams[1][1][1])
io.write("\n")
end
This doesn't solve the problem though. There shouldn't be that much memory consumption.
Maybe first cloning the models, and then ship to the GPU? Would that destroy the parameter references?
UPDATE: This question for the offical RNN module seems to hint to the fact that it supports super long sequences (1000+): https://github.com/Element-Research/rnn/issues/5
Thanks!