cudnn.torch
cudnn.torch copied to clipboard
BLSTM with dropout ~= 0 produces non-deterministic results
Hi there,
I am very exciting to try out the new LSTM (or bidirectional LSTM) models in torch.cudnn and they are faster than my own implementation.
However, when I tried to set the dropout rate of BLSTM to 0.2 (rnn_dropout = 0.2), BLSTM produces different results from time to time even if I used fixed random seeds. When I disabled dropout in BLSTM (rnn_dropout = 0), it produces deterministic results. My code is attached.
I assume there maybe two reasons for this.
- the random seed I passed to BLSTM is not working and cudnn is using its default random generation mechanism.
- dropout in cudnn is not deterministic. But I didn't find any discussions about the deterministic of dropout in CuDNN User Guide.
require 'cudnn'
require 'nn'
require 'cutorch'
require 'cunn'
require 'optim'
local function createDataset(n_in, n_y, seqlen)
local N, batchSize = 10, 5
local xs = {}
local ys = {}
for i = 1, N do
xs[i] = torch.rand(seqlen, batchSize, n_in):cuda()
ys[i] = (torch.rand(seqlen, batchSize) * n_y + 1):int():double():cuda()
end
return xs, ys, N
end
local function createModel(opts)
-- create lstm
local lstm = cudnn.BLSTM(opts.n_in, opts.n_hid, opts.n_layers)
if opts.rnn_dropout > 0 then
lstm.dropout = opts.rnn_dropout
lstm.seed = opts.seed
lstm:resetDropoutDescriptor()
end
-- create softmax
local softmax = nn.Sequential()
if opts.dropout > 0 then
softmax:add( nn.Dropout(opts.dropout, false, true) )
end
softmax:add( nn.Linear(opts.n_hid * 2, opts.n_y) )
softmax:add( nn.LogSoftMax() )
softmax = softmax:cuda()
-- create criterion
local criterion = nn.ClassNLLCriterion():cuda()
local m = nn.Parallel()
m:add(lstm)
m:add(softmax)
local params, grads = m:getParameters()
return {lstm = lstm, softmax = softmax, criterion = criterion,
params = params, grads = grads}
end
local function main()
print 'create dataset'
local opts = {seed = 1, n_in = 50,
n_hid = 100,
seqlen = 50,
n_layers = 2,
n_y = 10,
rnn_dropout = 0.2,
dropout = 0.2}
torch.manualSeed(opts.seed)
cutorch.manualSeed(opts.seed)
local xs, ys, size = createDataset(opts.n_in, opts.n_y, opts.seqlen)
local model = createModel(opts)
local sgdParam = {learningRate = 0.001}
for epoch = 1, 10 do
for i = 1, size do
local x, y = xs[i], ys[i]
local function feval(params_)
if model.params ~= params_ then
params:copy(params_)
end
model.grads:zero()
-- forward pass
local hids_ = model.lstm:forward(x)
local hids = hids_:view(hids_:size(1)*hids_:size(2), hids_:size(3))
local y_preds = model.softmax:forward(hids)
local loss = model.criterion:forward(y_preds, y:view(-1))
print(string.format('epoch = %d, i = %d, loss = %f', epoch, i, loss))
-- backward
local df_y_preds = model.criterion:backward(y_preds, y:view(-1))
local df_hids = model.softmax:backward(hids, df_y_preds)
local df_hids_ = df_hids:view(hids_:size(1), hids_:size(2), hids_:size(3))
model.lstm:backward(x, df_hids_)
return loss, model.grads
end
local _, loss_ = optim.adam(feval, model.params, sgdParam)
end
end
end
main()
Update:
The results of unidirectional LSTM is deterministic! It is likely that in bi-directional LSTM, the forward and the backward LSTM are computed concurrently and the dropout masks were applied randomly!
Thanks for reporting the issue! We have discovered that there are problems with dropout application in cudnn (non-determinism that you've discovered, and issues in the weight update), and are looking it. As a workaround, you can create your network layer-by-layer without dropout, and apply nn.Dropout between the layers.
@ngimel thanks for the feedback and the suggestion.
I have a follow-up question: How is dropout for LSTM (designed) implemented in cudnn v5? Are your guys using the strategies in https://arxiv.org/abs/1409.2329 (applying dropout to input of each layer)?
Yes, we are applying dropout to input of each layer.
sorry for pushing this again, has the non-deterministic behaviour been addressed in newer pytorch / cudnn versions?
Any progress on this issue ?