Cortexsys icon indicating copy to clipboard operation
Cortexsys copied to clipboard

Error while training LSTM

Open ririya opened this issue 8 years ago • 2 comments

I've been frequently getting this error while training the net. Can someone tell me what the problem might be?

Index exceeds matrix dimensions.

Error in varObj>@(C)full(C(:,r)) (line 45) v = cellfun(@(C) full(C(:,r)), obj.v, 'UniformOutput', false);

Error in varObj/getmb (line 45) v = cellfun(@(C) full(C(:,r)), obj.v, 'UniformOutput', false);

Error in nnCostFunctionLSTM (line 13) Y = varObj(nn.Y.getmb(r), nn.defs, nn.defs.TYPES.OUTPUT);

Error in testSumNumbersGenerator>@(nn,r,newRandGen)nnCostFunctionLSTM(nn,r,newRandGen)

Error in gradientDescentAdaDelta (line 69) [J, dJdW, dJdB] = feval(f, nn, r, true);

Error in testSumNumbersGenerator (line 137) nn = gradientDescentAdaDelta(costFunc, nn, defs, [], [], [], [], 'Training Entire Network');

ririya avatar Nov 15 '16 22:11 ririya

I've been trying to train a network that predicts the sum of the past two numbers. However i keep running into this kind of problem. If i increase the length of the sequence the training completes, however i don't get the expected output when i try to rebuild the sequence... Am I doing something wrong? This should be a very simple sequence to train. Here is my code

`clear; close all force; addpath('../../nn_gui'); addpath('../../nn_core'); addpath('../../nn_core/cuda'); addpath('../../nn_core/mmx'); addpath('../../nn_core/Optimizers'); addpath('../../nn_core/Activations'); addpath('../../nn_core/Activations'); addpath('../../nn_core/Wrappers'); addpath('../../nn_core/ConvNet'); addpath('Text');

PRECISION = 'double';

 % definitions(PRECISION, useGPU, whichThreads, plotOn) 

defs = definitions(PRECISION, true, [1], true);

% % Load the Shakespeares training set % Nchars = 50; % How many characters long for each sequence % offset = 0; % How much to shift the labels from the input data % txtpath = 'shakespeare_subset.txt'; % [X, vmap] = streamText2mat(txtpath, Nchars, offset); % offset = 1; % [Y, ~] = streamText2mat(txtpath, Nchars, offset);

seqLen = 2; N = 100000; offset = 0;

Xlong = 1:2;

for i=1:N

Xlong(end+1) = Xlong(end) + Xlong(end-1); if abs(Xlong(end)) > 2 Xlong(end) = -Xlong(end); end

% Xlong(end+1) = Xlong(end) +1; % if (Xlong(end) > 10) % Xlong(end) =1; % end end

% x123 = findstr(Xlong, [0 -1]); % Xlong(x123(1):x123(1)+5) % length(x123)

[X, vmap] = getSparseMatrix(Xlong, seqLen, offset); offset = 1;
[Y, ~] = getSparseMatrix(Xlong, seqLen, offset); % end

[BinCount] = hist(Xlong,vmap);

input_size = size(X{1},1); output_size = input_size; T = numel(X); % Length of all time sequences

% Both X and Y must include T=0 and T=Tf+1 'boundary conditions' filled % with zeros for convenience X(2:end+1) = X(:); X{1} = 0*X{1}; X(end+1) = X(1);

Y(2:end+1) = Y(:); Y{1} = 0*Y{1}; Y(end+1) = Y(1);

%%%%%%%%%%%%%%%%%%%%% Fine tuning Parameters %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% params = struct(); params.maxIter = precision(3000,defs); params.momentum = precision(0.9,defs); params.maxnorm = precision(0,defs); params.lambda = precision(0,defs); params.alphaTau = precision(0.25_params.maxIter,defs); % alpha_i = alpha_tau/(tau+i) (see "A Stochastic Quasi-Newton Method for Online Convex Optimization", Eqn. 7) params.denoise = precision(0,defs); % set to 0 to disable params.dropout = precision(0.6,defs); % set to 1 to disable params.miniBatchSize = precision(50,defs); % set to zero to disable mini-batches params.tieWeights = false; params.T = T; params.Tos = 0; % This is the "offset" time before the cost starts accumulating for the LSTM output % The idea here is that the LSTM can be fed with inputs % for n time steps, and won't be penalized for predictions % until t>=Tos. This helps by giving the LSTM context. % Optimization routine parameters:
params.alpha = precision(.001,defs); % If this is non-zero, use this learning rate for the entire network params.rho = precision(0.95, defs); % AdaDelta hyperparameter (don't generally need to modify) params.eps = precision(1e-6, defs); % AdaDelta hyperparameter (don't generally need to modify) params.cg.N = 10; % Max CG iterations before reset params.cg.sigma0 = 0.01; % CG Secant step-method parameter params.cg.jmax = 10; % Maximum CG Secant iterations params.cg.eps = 1e-4; % Update threshold for CG params.cg.mbIters = 10; % How many CG iterations per minibatch?

%%%%%%%%%%%%%%%%%%%%%%%%% Layer Setup %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% layers.af{1} = []; layers.sz{1} = [input_size 1 1]; layers.typ{1} = defs.TYPES.INPUT;

layers.af{end+1} = tanh_af(defs, []); layers.sz{end+1} = [128 1 1]; layers.typ{end+1} = defs.TYPES.LSTM;

layers.af{end+1} = softmax(defs, defs.COSTS.CROSS_ENTROPY); layers.sz{end+1} = [output_size 1 1]; layers.typ{end+1} = defs.TYPES.FULLY_CONNECTED; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

if defs.plotOn nnShow(423, layers, defs); end

% Process Y such that first time sequence is stripped off and replaced by % a null at the end. This will cause LSTM to predict next character in the % sequence. The final prediction for a sequence should be null (zero).

X = varObj(X,defs,defs.TYPES.INPUT); Y = varObj(Y,defs,defs.TYPES.OUTPUT);

modelName = 'modelSumNumbers13.mat'; if ~exist(modelName)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% TRAINING %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

nn = nnLayers(params, layers, X, Y, {}, {}, defs); nn.initWeightsBiases();

costFunc = @(nn,r,newRandGen) nnCostFunctionLSTM(nn,r,newRandGen); nn = gradientDescentAdaDelta(costFunc, nn, defs, [], [], [], [], 'Training Entire Network'); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

save(modelName, 'nn','vmap');

else load(modelName);

end

% outputText = []; outputSeq = [];

%% Generate some text by 'sampling' from LSTM % T_samp_temp = 1/35; % "Temperature" of random sampling process (higher temperatures lead to more randomness) T_samp_temp = 1; T_samp = 5000; % Length of sequence to generate % seedText = 'ROMEO:'; % seedText = 'Rafael Iriya';

% seedMatrix = unidrnd(3,3,6); seqLen = 2; seedMatrix = Xlong(1:seqLen); outputSeq = seedMatrix;

% Initial text (to provide some context) % vsize = numel(vmap); vsize = size(vmap,2); % Xs = full(ascii2onehot(seedText, vmap)); Xs = full(vec2map(seedMatrix, vmap)); % Xs = [zeros(vsize,1) Xs]; % Xtmp = zeros(vsize,1,size(seedMatrix,2)+1); Xtmp = zeros(vsize,1,size(seedMatrix,2)); Xtmp(:,1,:) = Xs; Xs = Xtmp;

nn.disableCuda(); nn.A{1} = varObj(Xs, nn.defs); preallocateMemory(nn, 1, size(seedMatrix,2)+1); % % Load up the LSTM with the context % for t=2:size(seedMatrix,2)+1 % feedforwardLSTM(nn, 1, t, false, true); % [~,cout] = max(nn.A{end}.v(:,1,t)); % % outputText = [outputText vmap(cout)] % outputSeq = [outputSeq vmap(:,cout)]; % %fprintf('%s', vmap(cout)); % end

for t=1:size(seedMatrix,2)+100 feedforwardLSTM(nn, 1, seqLen, false, true); [value,cout] = max(nn.A{end}.v(:,1,seqLen));

 outputSeq = [outputSeq vmap(:,cout)];

tmp = zeros(vsize,1);
tmp(cout) = 1;

for tt = 1:seqLen-1
    nn.A{1}.v(:,1,tt) = nn.A{1}.v(:,1,tt+1);
end

 nn.A{1}.v(:,1,end) = tmp;         

end

% Start sampling characters by feeding output back into the input for the % next time step % for t=size(seedMatrix,2)+2:size(seedMatrix,2)+T_samp % % Generate a random sample from the softmax probability distribution % % First, adjust/scale the distribution by a "temperature" that controls % % how likely we are to pick the maximum likelihood prediction % % P_next_char = exp(1/T_samp_temp*nn.A{end}.v(:,1,t-1)); % % P_next_char = P_next_char./sum(P_next_char); % normalize distribution % % cin = randsample(vsize,1,true,P_next_char); % % [value,cin] = max(nn.A{end}.v(:,1,t-1)); % % % fprintf('%s', vmap(cin)); % % outputText = [outputText vmap(cin)] % outputSeq = [outputSeq vmap(:,cin)]; %
% % Plot the distribution over characters % %{ % figure(777); % plot(P_next_char); % set(gca, 'XTick',1:numel(P_next_char), 'XTickLabel',vmap) % waitforbuttonpress; % %} %
% % Feed back the output to the input % % Generate the input for the next time step % tmp = zeros(vsize,1); % tmp(cin) = 1; % nn.A{1}.v(:,1,t) = tmp; %
% % Step the RNN forward % feedforwardLSTM(nn, 1, t, false, true); %
% end % disp(''); `

`function [X, vmap] = getSparseMatrix(Xlong, seqLen, offset)

Xlong = Xlong(:,1+offset:end); N = size(Xlong,2);

m = floor(N/seqLen);

Xlong = Xlong(1:m*seqLen);

dimX = size(Xlong,1);    

Npad = size(Xlong,2);

vmap = unique(Xlong', 'rows')'; vocabSize = size(vmap,2);

Xmap = zeros(1,Npad);

% Map all ASCII values to the reduced set for i=1:vocabSize indx = find(all(bsxfun(@eq, Xlong', vmap(:,i)'), 2)); % [~,indx]=ismember(vmap(:,i)',Xlong','rows') Xmap(indx) = i; end

 Xmap = reshape(Xmap,seqLen,[]);

%
% Xmap= hankel(Xmap, 1:seqLen); % Xmap = Xmap';%

% xreal = vmap(Xmap);

Xca = cell(seqLen,1); I = speye(vocabSize);

I = speye(vocabSize);

for t=1:seqLen
    Xt = Xmap(t,:);
    Xca{t} = I(Xt(:),:)';

end

Xfull1 = full(Xca{1});
Xfull2 = full(Xca{2});

 X = Xca;
`

ririya avatar Nov 15 '16 23:11 ririya

I realized the problem is the trained network is not taking the memory into account, if I enter [1 2] the output should be -3, but its giving me something else, and if I change it to [3 2], [-1,2] or anything else it gives me the same result, which means its only taking 2 into account and not what comes before it. How do i change the network for it to take the memory into account?

ririya avatar Nov 16 '16 03:11 ririya