Knet.jl icon indicating copy to clipboard operation
Knet.jl copied to clipboard

Feature request: LayerNormLSTM kernel

Open ngphuoc opened this issue 6 years ago • 3 comments
trafficstars

I implemented a naive layer normalization with "manual" LSTM and it run extremely slow, 10 times slower than using the cudnn RNN interface (no layer normalization feature).

Could we implement a LayerNorm kernel for RNN similar to pytorch? I do a simple search in pytorch source code for some references, hope it gives some glues:

pytorch/caffe2/python/rnn_cell.py
pytorch/aten/src/ATen/native/cpu/layer_norm_kernel.cpp
pytorch/aten/src/ATen/native/layer_norm.cpp

ngphuoc avatar Sep 22 '19 14:09 ngphuoc

I have implemented LayerNorm like this before:

struct LayerNorm; a; b; ϵ; end

function LayerNorm(dmodel; eps=1e-6)
    a = param(dmodel; init=ones)
    b = param(dmodel; init=zeros)
    LayerNorm(a, b, eps)
end

function (l::LayerNorm)(x, o...)
    μ = mean(x,dims=1)
    σ = std(x,mean=μ,dims=1)
    l.a .* (x .- μ) ./ (σ .+ l.ϵ) .+ l.b                                                         
end

There may be a few tricks to make it faster: e.g. not doing (x .- μ) twice etc. But ultimately we need a GPU kernel.

I do not understand why you have to do manual LSTM and cannot use the cudnn interface. Is it a multilayer RNN and the LayerNorm is in between the RNN layers? In that case a GPU kernel is not going to help, we need to wait for cudnn to catch up or you need to separate your layers and stick LayerNorms between them.

denizyuret avatar Sep 26 '19 17:09 denizyuret

Yes, I have a similar LayerNorm struct to yours. For LSTM we need to apply it to the LSTMCell internal as follow (copied from the Layer Normalization paper):

2019-10-08-120004_662x389_scrot

ngphuoc avatar Oct 08 '19 01:10 ngphuoc

Until CuDNN supports this in its API, one workaround I can think of is to create N separate RNNs for N layers and insert LayerNorm layers between them manually. One could write a struct that hides most of the dirty details from the user.

On Tue, Oct 8, 2019 at 4:00 AM ngphuoc [email protected] wrote:

Yes, I have a similar LayerNorm struct to yours. For LSTM we need to apply it to the LSTMCell internal as follow (copied from the Layer Normalization paper):

[image: 2019-10-08-120004_662x389_scrot] https://user-images.githubusercontent.com/23738797/66359526-3a72b000-e9c3-11e9-89e9-b78e679a01e6.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/denizyuret/Knet.jl/issues/492?email_source=notifications&email_token=AAN43JSAGTILRW2CX37AGQLQNPLULA5CNFSM4IZC3DG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEASJEII#issuecomment-539267617, or mute the thread https://github.com/notifications/unsubscribe-auth/AAN43JUXJVHW6GADIRZ4INTQNPLULANCNFSM4IZC3DGQ .

denizyuret avatar Oct 08 '19 11:10 denizyuret