char-rnn icon indicating copy to clipboard operation
char-rnn copied to clipboard

Memory Issue in Scaling sequence length

Open mtanana opened this issue 9 years ago • 2 comments

It seems there is a weird issue where you can max out the GPU memory when you scale the length of the sequence. It happens during the clone stage. And for some reason, this doesn't happen with similar models in the Wojciech codebase, even though he seems to do the same thing (cloning the net based on the sequence length).

I wonder if any lua ninjas out there have ideas on the differences in how the different clone methods are implemented

(PS. this code is amazing...really awesome stuff)

UPDATE: Actually, I think there is similar behavior in the Wojciech codebase, but it happens at a higher sequence length.

With both of these, I wonder if there is some way in the code to share the memory of the parameters that are fixed to be equivalent instead of creating redundant copies.

mtanana avatar Oct 04 '15 00:10 mtanana

Second that question. Running out of GPU memory e.g. with sequence length 217.

UPDATE: I think I verified that the parameters are indeed shared between the clones. To be exact, the clones point to the prototype. If you use this snippet, you will see that changing the prototype parameters affects the clone parameters:

if net.parameters then
    local cloneParams, cloneGradParams = clone:parameters()
    local cloneParamsNoGrad
    for i = 1, #params do
        cloneParams[i]:set(params[i])
        cloneGradParams[i]:set(gradParams[i])
    end
    if paramsNoGrad then
        cloneParamsNoGrad = clone:parametersNoGrad()
        for i =1,#paramsNoGrad do
            cloneParamsNoGrad[i]:set(paramsNoGrad[i])
        end
    end

    params[1][1][1] = 0.12345
    io.write(params[1][1][1])
    io.write(cloneParams[1][1][1])
    io.write("\n")
end

This doesn't solve the problem though. There shouldn't be that much memory consumption.

Maybe first cloning the models, and then ship to the GPU? Would that destroy the parameter references?

UPDATE: This question for the offical RNN module seems to hint to the fact that it supports super long sequences (1000+): https://github.com/Element-Research/rnn/issues/5

ghost avatar Dec 07 '15 22:12 ghost

Thanks!

mtanana avatar Feb 25 '16 16:02 mtanana