Cortexsys
Cortexsys copied to clipboard
Error while training LSTM
I've been frequently getting this error while training the net. Can someone tell me what the problem might be?
Index exceeds matrix dimensions.
Error in varObj>@(C)full(C(:,r)) (line 45) v = cellfun(@(C) full(C(:,r)), obj.v, 'UniformOutput', false);
Error in varObj/getmb (line 45) v = cellfun(@(C) full(C(:,r)), obj.v, 'UniformOutput', false);
Error in nnCostFunctionLSTM (line 13) Y = varObj(nn.Y.getmb(r), nn.defs, nn.defs.TYPES.OUTPUT);
Error in testSumNumbersGenerator>@(nn,r,newRandGen)nnCostFunctionLSTM(nn,r,newRandGen)
Error in gradientDescentAdaDelta (line 69) [J, dJdW, dJdB] = feval(f, nn, r, true);
Error in testSumNumbersGenerator (line 137) nn = gradientDescentAdaDelta(costFunc, nn, defs, [], [], [], [], 'Training Entire Network');
I've been trying to train a network that predicts the sum of the past two numbers. However i keep running into this kind of problem. If i increase the length of the sequence the training completes, however i don't get the expected output when i try to rebuild the sequence... Am I doing something wrong? This should be a very simple sequence to train. Here is my code
`clear; close all force; addpath('../../nn_gui'); addpath('../../nn_core'); addpath('../../nn_core/cuda'); addpath('../../nn_core/mmx'); addpath('../../nn_core/Optimizers'); addpath('../../nn_core/Activations'); addpath('../../nn_core/Activations'); addpath('../../nn_core/Wrappers'); addpath('../../nn_core/ConvNet'); addpath('Text');
PRECISION = 'double';
% definitions(PRECISION, useGPU, whichThreads, plotOn)
defs = definitions(PRECISION, true, [1], true);
% % Load the Shakespeares training set % Nchars = 50; % How many characters long for each sequence % offset = 0; % How much to shift the labels from the input data % txtpath = 'shakespeare_subset.txt'; % [X, vmap] = streamText2mat(txtpath, Nchars, offset); % offset = 1; % [Y, ~] = streamText2mat(txtpath, Nchars, offset);
seqLen = 2; N = 100000; offset = 0;
Xlong = 1:2;
for i=1:N
Xlong(end+1) = Xlong(end) + Xlong(end-1); if abs(Xlong(end)) > 2 Xlong(end) = -Xlong(end); end
% Xlong(end+1) = Xlong(end) +1; % if (Xlong(end) > 10) % Xlong(end) =1; % end end
% x123 = findstr(Xlong, [0 -1]); % Xlong(x123(1):x123(1)+5) % length(x123)
[X, vmap] = getSparseMatrix(Xlong, seqLen, offset);
offset = 1;
[Y, ~] = getSparseMatrix(Xlong, seqLen, offset);
% end
[BinCount] = hist(Xlong,vmap);
input_size = size(X{1},1); output_size = input_size; T = numel(X); % Length of all time sequences
% Both X and Y must include T=0 and T=Tf+1 'boundary conditions' filled % with zeros for convenience X(2:end+1) = X(:); X{1} = 0*X{1}; X(end+1) = X(1);
Y(2:end+1) = Y(:); Y{1} = 0*Y{1}; Y(end+1) = Y(1);
%%%%%%%%%%%%%%%%%%%%% Fine tuning Parameters %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
params = struct();
params.maxIter = precision(3000,defs);
params.momentum = precision(0.9,defs);
params.maxnorm = precision(0,defs);
params.lambda = precision(0,defs);
params.alphaTau = precision(0.25_params.maxIter,defs); % alpha_i = alpha_tau/(tau+i) (see "A Stochastic Quasi-Newton Method for Online Convex Optimization", Eqn. 7)
params.denoise = precision(0,defs); % set to 0 to disable
params.dropout = precision(0.6,defs); % set to 1 to disable
params.miniBatchSize = precision(50,defs); % set to zero to disable mini-batches
params.tieWeights = false;
params.T = T;
params.Tos = 0; % This is the "offset" time before the cost starts accumulating for the LSTM output
% The idea here is that the LSTM can be fed with inputs
% for n time steps, and won't be penalized for predictions
% until t>=Tos. This helps by giving the LSTM context.
% Optimization routine parameters:
params.alpha = precision(.001,defs); % If this is non-zero, use this learning rate for the entire network
params.rho = precision(0.95, defs); % AdaDelta hyperparameter (don't generally need to modify)
params.eps = precision(1e-6, defs); % AdaDelta hyperparameter (don't generally need to modify)
params.cg.N = 10; % Max CG iterations before reset
params.cg.sigma0 = 0.01; % CG Secant step-method parameter
params.cg.jmax = 10; % Maximum CG Secant iterations
params.cg.eps = 1e-4; % Update threshold for CG
params.cg.mbIters = 10; % How many CG iterations per minibatch?
%%%%%%%%%%%%%%%%%%%%%%%%% Layer Setup %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% layers.af{1} = []; layers.sz{1} = [input_size 1 1]; layers.typ{1} = defs.TYPES.INPUT;
layers.af{end+1} = tanh_af(defs, []); layers.sz{end+1} = [128 1 1]; layers.typ{end+1} = defs.TYPES.LSTM;
layers.af{end+1} = softmax(defs, defs.COSTS.CROSS_ENTROPY); layers.sz{end+1} = [output_size 1 1]; layers.typ{end+1} = defs.TYPES.FULLY_CONNECTED; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
if defs.plotOn nnShow(423, layers, defs); end
% Process Y such that first time sequence is stripped off and replaced by % a null at the end. This will cause LSTM to predict next character in the % sequence. The final prediction for a sequence should be null (zero).
X = varObj(X,defs,defs.TYPES.INPUT); Y = varObj(Y,defs,defs.TYPES.OUTPUT);
modelName = 'modelSumNumbers13.mat'; if ~exist(modelName)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% TRAINING %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
nn = nnLayers(params, layers, X, Y, {}, {}, defs); nn.initWeightsBiases();
costFunc = @(nn,r,newRandGen) nnCostFunctionLSTM(nn,r,newRandGen); nn = gradientDescentAdaDelta(costFunc, nn, defs, [], [], [], [], 'Training Entire Network'); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
save(modelName, 'nn','vmap');
else load(modelName);
end
% outputText = []; outputSeq = [];
%% Generate some text by 'sampling' from LSTM % T_samp_temp = 1/35; % "Temperature" of random sampling process (higher temperatures lead to more randomness) T_samp_temp = 1; T_samp = 5000; % Length of sequence to generate % seedText = 'ROMEO:'; % seedText = 'Rafael Iriya';
% seedMatrix = unidrnd(3,3,6); seqLen = 2; seedMatrix = Xlong(1:seqLen); outputSeq = seedMatrix;
% Initial text (to provide some context) % vsize = numel(vmap); vsize = size(vmap,2); % Xs = full(ascii2onehot(seedText, vmap)); Xs = full(vec2map(seedMatrix, vmap)); % Xs = [zeros(vsize,1) Xs]; % Xtmp = zeros(vsize,1,size(seedMatrix,2)+1); Xtmp = zeros(vsize,1,size(seedMatrix,2)); Xtmp(:,1,:) = Xs; Xs = Xtmp;
nn.disableCuda(); nn.A{1} = varObj(Xs, nn.defs); preallocateMemory(nn, 1, size(seedMatrix,2)+1); % % Load up the LSTM with the context % for t=2:size(seedMatrix,2)+1 % feedforwardLSTM(nn, 1, t, false, true); % [~,cout] = max(nn.A{end}.v(:,1,t)); % % outputText = [outputText vmap(cout)] % outputSeq = [outputSeq vmap(:,cout)]; % %fprintf('%s', vmap(cout)); % end
for t=1:size(seedMatrix,2)+100 feedforwardLSTM(nn, 1, seqLen, false, true); [value,cout] = max(nn.A{end}.v(:,1,seqLen));
outputSeq = [outputSeq vmap(:,cout)];
tmp = zeros(vsize,1);
tmp(cout) = 1;
for tt = 1:seqLen-1
nn.A{1}.v(:,1,tt) = nn.A{1}.v(:,1,tt+1);
end
nn.A{1}.v(:,1,end) = tmp;
end
% Start sampling characters by feeding output back into the input for the
% next time step
% for t=size(seedMatrix,2)+2:size(seedMatrix,2)+T_samp
% % Generate a random sample from the softmax probability distribution
% % First, adjust/scale the distribution by a "temperature" that controls
% % how likely we are to pick the maximum likelihood prediction
% % P_next_char = exp(1/T_samp_temp*nn.A{end}.v(:,1,t-1));
% % P_next_char = P_next_char./sum(P_next_char); % normalize distribution
% % cin = randsample(vsize,1,true,P_next_char);
%
% [value,cin] = max(nn.A{end}.v(:,1,t-1));
%
% % fprintf('%s', vmap(cin));
% % outputText = [outputText vmap(cin)]
% outputSeq = [outputSeq vmap(:,cin)];
%
% % Plot the distribution over characters
% %{
% figure(777);
% plot(P_next_char);
% set(gca, 'XTick',1:numel(P_next_char), 'XTickLabel',vmap)
% waitforbuttonpress;
% %}
%
% % Feed back the output to the input
% % Generate the input for the next time step
% tmp = zeros(vsize,1);
% tmp(cin) = 1;
% nn.A{1}.v(:,1,t) = tmp;
%
% % Step the RNN forward
% feedforwardLSTM(nn, 1, t, false, true);
%
% end
% disp('');
`
`function [X, vmap] = getSparseMatrix(Xlong, seqLen, offset)
Xlong = Xlong(:,1+offset:end); N = size(Xlong,2);
m = floor(N/seqLen);
Xlong = Xlong(1:m*seqLen);
dimX = size(Xlong,1);
Npad = size(Xlong,2);
vmap = unique(Xlong', 'rows')'; vocabSize = size(vmap,2);
Xmap = zeros(1,Npad);
% Map all ASCII values to the reduced set for i=1:vocabSize indx = find(all(bsxfun(@eq, Xlong', vmap(:,i)'), 2)); % [~,indx]=ismember(vmap(:,i)',Xlong','rows') Xmap(indx) = i; end
Xmap = reshape(Xmap,seqLen,[]);
%
% Xmap= hankel(Xmap, 1:seqLen);
% Xmap = Xmap';%
% xreal = vmap(Xmap);
Xca = cell(seqLen,1); I = speye(vocabSize);
I = speye(vocabSize);
for t=1:seqLen
Xt = Xmap(t,:);
Xca{t} = I(Xt(:),:)';
end
Xfull1 = full(Xca{1});
Xfull2 = full(Xca{2});
X = Xca;
`
I realized the problem is the trained network is not taking the memory into account, if I enter [1 2] the output should be -3, but its giving me something else, and if I change it to [3 2], [-1,2] or anything else it gives me the same result, which means its only taking 2 into account and not what comes before it. How do i change the network for it to take the memory into account?