Knet.jl icon indicating copy to clipboard operation
Knet.jl copied to clipboard

Memory leakage in a custom RNN

Open ahmetumutdurmus opened this issue 6 years ago • 0 comments

Hi,

I was trying to implement an RNN function myself on Knet and I have run into a problem of memory. I am running an RTX 1050Ti GPU on a PC which admittedly has a fairly small memory of 4 GB yet my problem is not really about that.

I have provided the code below if anybody would like to replicate my issue. I define an embedding layer, a dense layer and a chain object along with a sequence minibatcher which are fairly standard and in line with the tutorials. I also have written a custom RNN function called mRNN. When I tried to a model which involves an embedding layer, a mRNN layer and a dense layer I get the out of GPU memory error long into the training. My model does fit into the memory (I have also done back of the envelope calculations) so I think I have some 'memory leakage' in my model.

I have also included the Knet.gc() function into my training routine and call it every 100 iterations and I print the gpu memory info every 100 iterations only to see that my GPU memory usage linearly increases with each iteration until the machine runs out of memory. So if you run the code on a bigger GPU memory you might not get the same error but you will the linear memory increase.

For comparison I have included 2 other models: model0 and model1. model0 is a feedforward network and model1 is a recurrent network using the RNN struct built into Knet. Both models work without running into any problems. That is between each iteration of the training the memory usage is constant.

I can't really tell if I am doing something I am not allowed to do or is this a problem in KnetArrays. Thanks. Below is the code if anyone would like to replicate the problem:

using Pkg; for p in ("Knet","Distributions"); haskey(Pkg.installed(),p) || Pkg.add(p); end

using Knet, Distributions

include(Knet.dir("data/mikolovptb.jl"))

(trn,dev,tst,vocab) = mikolovptb()

function seqbatch(x,B,T)
    N = (length(x) - 1) ÷ B
    xo = permutedims(reshape(x[1:N*B],1,N,B), [1, 3, 2])
    yo = permutedims(reshape(x[2:N*B+1],1,N,B), [1, 3, 2])
    data = minibatch(xo, yo, T)
end

struct Embed; W; end
Embed(input::Int,output::Int;init=Uniform(-0.1,0.1), atype=Knet.atype()) = Embed(Param(atype(rand(init, output, input))))
(e::Embed)(x) = e.W[:,dropdims(x, dims = 1)]

struct Chain
    layers
    Chain(layers...) = new(layers)
end
(c::Chain)(x) = (for l in c.layers; x = l(x); end; x)
(c::Chain)(x,y) = nll(c(x),mat(y,dims=1))
(c::Chain)(data::Knet.Data) = mean(c(x, y) for (x, y) in data)
flush!(c::Chain) = (for l in c.layers; (typeof(l) == mRNN) && flush!(l); end)

struct Dense; w; b; f; end
Dense(input::Int,output::Int,f=identity; init=Uniform(-0.1,0.1), atype=Knet.atype()) = Dense(Param(atype(rand(init, output, input))), Param(atype(zeros(output, 1))), f)
(d::Dense)(x) = d.f.(d.w * mat(x,dims=1) .+ d.b)


mutable struct mRNN; Whx; Whh; bh; bx; f; h; end
mRNN(input::Int, hidden::Int, f=tanh; init=Uniform(-0.1,0.1), atype=Knet.atype()) = mRNN(Param(atype(rand(init, hidden, input))), Param(atype(rand(init, hidden, hidden))),
                                                                                          Param(atype(zeros(hidden, 1))), Param(atype(zeros(hidden, 1))),f, nothing)
rnnstep!(r::mRNN, x) = (r.h = r.f.(r.Whx * x .+ r.Whh * r.h .+ r.bh .+ r.bx))

function (r::mRNN)(x; atype=KnetArray{Float32})#atype=Knet.atype())
    @assert (ndims(x) == 2) | (ndims(x) == 3) "Input `x` must be either 2 or 3 dimensional of the form [nx, bs, [sl]]."
    if isnothing(r.h)
        r.h = atype(zeros(size(r.Whh, 1), size(x, 2)))
    end
    if ndims(x) == 2
        rnnstep!(r, x)
        return r.h
    elseif ndims(x) == 3
        #output = atype(zeros(size(r.Whh, 1), size(x, 2), size(x, 3)))
        output = Any[]#
        for t in 1:size(x, 3)
            rnnstep!(r, x[:,:,t])
            push!(output, r.h)#
            #output[:,:,t] = r.h
        end
        return reshape(cat(output..., dims = 2), size(r.Whh, 1), size(x, 2), size(x, 3))#
        #return output
    end
end
flush!(r::mRNN) = (r.h = nothing)

function sgdupdate!(func, args; lr = 0.1)
    fval = @diff func(args...)
    for param in Knet.params(fval)
        ∇param = grad(fval, param)
        param .-= lr * ∇param
    end
    return value(fval)
end

sgd(func, data; lr=0.1) =
    (n = length(data); for (i, args) in enumerate(data); sgdupdate!(func, args; lr=lr); if i % 100 == 0; println("$(round(i/n, digits = 2))% is done!"); Knet.gc(); Knet.memdbg(); end; end;)

B = 20
T = 20
dtrn = seqbatch(vcat(trn...), B, T)
ddev = seqbatch(vcat(dev...), B, T)
dtst = seqbatch(vcat(tst...), B, T)

#The feedforward network.
model0 = Chain(Embed(10000, 120, atype = KnetArray{Float32}), Dense(120, 120, relu, atype = KnetArray{Float32}), Dense(120, 10000, identity, atype = KnetArray{Float32}))
#Knet RNN, works with no problem.
model1 = Chain(Embed(10000, 120, atype = KnetArray{Float32}), RNN(120, 120, rnnType = :tanh, usegpu = true), Dense(120, 10000, identity, atype = KnetArray{Float32}))
#My custom RNN, has memory leakage.
model2 = Chain(Embed(10000, 120, atype = KnetArray{Float32}), mRNN(120, 120, atype = KnetArray{Float32}), Dense(120, 10000, identity, atype = KnetArray{Float32}))

println("model0 is being trained.")
sgd(model0, dtrn)
println("model1 is being trained.")
sgd(model1, dtrn)
println("model2 is being trained.")
sgd(model2, dtrn)

ahmetumutdurmus avatar Jun 10 '19 08:06 ahmetumutdurmus