Flux.jl
Flux.jl copied to clipboard
Allowing for function replacement in LSTM and GRU layers
This makes the behaviour of the LSTM and GRU recurrent layers consistent with other layers.
This seems like a good thing to have, although perhaps with more informative names than sigmRepl
etc; is there a better name for these gates?
Also, the CUDA bindings will have to be updated, since only layers with the default setup will be able to use CUDNN.
Ah, I hadnt seen the CUDA bindings. I'll get on that
I'll take a look to see if the papers call the gates something particular. I was going to go with σ and tanh for the names to stay in sync with the other layers which just use σ but thought that would be extra confusing hah
For the CUDA bindings, without going into CUDNN, I don't think there is much I can do except throw a message saying that the selected functions aren't GPU compatible? I looked at the CUDNN docs (https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html) and it looks like it has a hardcoded implementation? I am not sure...
Probably would restrict submission of a custom function LSTM/GRU to the GPU here though?
curnn.jl:10
function CUDNN.RNNDesc(m::CuRNNs{T}) where T
h, i = length(m.h), size(m.Wi, 2)
mode = m isa CuRNN ?
(m.σ == tanh ? CUDNN.CUDNN_RNN_TANH : CUDNN.CUDNN_RNN_RELU) :
m isa CuGRU ? CUDNN.CUDNN_GRU : CUDNN.CUDNN_LSTM
r = CUDNN.RNNDesc{T}(mode, i, h)
return r
end
For the replacement names - we would need to call them the input, output and forget gates. But the input and output gate functions would be a pair (sigm, tanh). As seen here...
I would rather keep the current setup with just the two functions because I want to use it for replacing sigmoid with better sigmoids and tanh with better tanh.
But, I know that others are interested in controlling the actual model where they may want to vary the input and output gates independently.
I am happy either way though, so let me know what you think would be best.
RE cuda, you can look here. The situation is the same for vanilla RNNs, we allow the activation function to be changed but only two are supported by CUDNN. Changing the CuRNN
signature prevents incompatible RNNs from ever dispatching to CUDNN, they just fall through to the Julia implementation instead; the same should be possible for LSTMs.
Hi, I have done the changes required for CUDA. Is there any performance benchmarks/tests I could run to ensure that my changes haven' t caused any performance regression?
Updated the repo with the CUDA support. Added tests to the Cuda test files so that we can check that explicitly stating the sigmoid and tanh functions works as well as a test with a random function.
I am limited to a laptop at the moment with a limited NVIDIA chip, so couldn't check the performance for any regressions but can when I return after Christmas.
Is anything else needed for this to be merged?
bors try
bors try
Damn I think the normal checking is broken. Seems to be failing on downloading data... Local testing all passes.
This looks like a network error. Can we rerun the test and merge this?
@NTimmons could you rebase? Then we can merge if tests pass
The docs currently say: LSTM(in::Integer, out::Integer, σ = tanh)
. But this fails:
julia> LSTM(1,1,σ=tanh)
ERROR: MethodError: no method matching Flux.LSTMCell(::Int64, ::Int64; σ=tanh)
I presume docs were updated prematurely before this was merged..?
I don't believe this has been merged. I need to check it and retest it. Been busy on other projects and missed the update. I will get on this ASAP
Those docs are also from v0.5. The latest reference looks correct.