nntools
nntools copied to clipboard
Should we include sequence prediction ability using single input for LSTM(or generally RNN)
Normally when we train LSTM, we unfold it for certain time steps and train it with same length data. However after training, we would like to just give an initial data to let the LSTM network to predict whole sequence.
Do we have this function in this library? If not, I think I can implement this function in code.
thanks.
I think this should be possible by setting learn_init=True
, training, and then at test time simply feeding in an input of all zeros, right?
Sorry, maybe I didn't make it clear. It is more like this graph.
For example, if we would like to use LSTM to generate a sequence of data using a set of sequences of length 201,
First we unfold LSTM for 200 time time steps and for each sequence we use D[0:200] as input and D[1:201] as output for training.
After training, we input initial data D[0] to network then the network outputs predicted value \hat{D}[1], then we take \hat{D}[1] as input and generate \hat{D}[2] and so on.
Then we can get generated sequence using only initial condition D[0]. Can it be easily implemented using this library?
I noticed if the LSTM is unfolded for 200 time time steps, then input should be always of length 200. A single data input ( D[0] in previous case) is not possible.
Ah, yeah, I see what you mean. Yeah, I'm not thinking of a simple way to do this with the code as it is, because it treats the input sequence as fixed and known when computing the output sequence. I think you'd basically have to compute the output of the network timestep-by-timestep and feed it back in manually, which would probably be inefficient. If you can think of a better way to do it, feel free to submit a PR.
timestep-by-timestep way is also what I am thinking for now. Ill submit a PR when I am done with that.
In that case, I think this would make most sense as an example script, as I don't think any functionality needs to be added to the LSTM/recurrent layer classes themselves.
@gaoyuankidult Did you manage to get a working example. I'm looking into this as well.
As far was I can see the timestep-by-timestep strategy is not possible with LSTM layers, because there is no way to get the cell state and manually carry it to the next time iteration. Changing the API to allow getting the cell state seams like a rather big refactor, since the API in nntools/lasagne is restricted to just one output.
I may be terribly wrong here, I'm fairly new to lasagne.
As far was I can see the timestep-by-timestep strategy is not possible with LSTM layers, because there is no way to get the cell state and manually carry it to the next time iteration.
I'd think you would just get the cell
attribute of your LSTMLayer
instance. If you can think of another way to do this which is not timestep-by-timestep, I could see it being a separate layer.
I'd think you would just get the cell attribute of your LSTMLayer instance.
That is not an existing feature, right?
If you can think of another way to do this which is not timestep-by-timestep, I could see it being a separate layer.
Well, in my personal framework I don't create a theano.scan
in each layer, but have a def scan
method on layer each object which just do the one iteration, there is then a consuming class which does the theano.scan
.
That is not an existing feature, right?
Sure it is, cell
is a public attribute of every LSTMLayer
instance.
Well, in my personal framework I don't create a theano.scan in each layer, but have a def scan method on layer each object which just do the one iteration, there is then a consuming class which does the theano.scan.
Ah, that makes sense.
Sure it is, cell is a public attribute of every LSTMLayer instance.
Could you give me a link to where it is defined. I don't see that attribute, https://github.com/craffel/nntools/search?q=self.cell&type=Code
Oops, sorry, you're right! Currently working on other stuff so not thinking about this straight. Although I'm a little confused why you want to get the cell state instead of just the LSTM unit's output.
Oops, sorry, you're right!
All is forgiven :) Do you have something in mind for how this could be implemented, the only thing I can come up with is exposing it as self.cell
attribute in the get_output_for
call, but that seams really hacky.
Although I'm a little confused why you want to get the cell state instead of just the LSTM unit's output.
I don't know a precise answer for that question, but it makes sense to me as the cell state would otherwise always be set to the initial value, which kinda make the entire LSTM complexity redundant. In the normal LSTM setting, the cell state is also carried to the next time iteration. The fact that we uses a timestep-by-timestep strategy in the implementation doesn't really change that.
See for instance http://arxiv.org/abs/1409.3215 where such a model is used, though to be fair it is not very detailed about the model. @skaae can perhaps provide more details on the model and why.
I did implement something similar to the encoder decoder structure in http://arxiv.org/abs/1409.3215.
I added two flags to the LSTMlayer
-
returnsequnce
: if false only the last hidden state is returned -
returncell
: if true return cell and hid as list
In slightly ugly python it is:
output_hid = output_scan[1]
# optionally only return last hidden state
if self.return_sequence:
output_hid = output_hid.dimshuffle(1, 0, 2)
else:
output_hid = output_hid[-1]
# if returncell is true we return both cell and hidden states.
# This is needed for the encoder/decoder framework.
if self.return_cell:
output_cell = output_scan[0]
if self.return_sequence:
output_cell = output_cell.dimshuffle(1, 0, 2)
else:
output_cell = output_cell[-1] #return last position
return output_cell, output_hid
else:
return output_hid
For the encoder/decoder framework I set returnsequence=False and returncell=True.
I then created a DecoderLSTMLayer which takes the cell and hid as input and use them as initial states when decoding. I think an alternative approach is to create a repeat layer and then just repeat the output. Then you can just use a normal LSTM layer. (Inspiration for repeat layer: http://keras.io/layers/core/#repeatvector).
For the decoder it is fairly standard to use the previously predicted class probabilities as input in the next decoding step. I havent found an elegant solution to that problem. Currently i let my decoder take a class network as input. The class network takes the output from the decoder and produce class probabilites, which you can return in scan and feed them back into the LSTM.
@AndreasMadsen I am really sorry, AndreasMadsen. I had to put more time on my master's thesis, so the implementation was suspended. If you have any idea, pls go ahead. I may join your discussion latter.
I am new to lasagne/nntools (not to mention neural networks in general), so do pardon my slightly noobish questions. I am trying to implement a encoder-decoder framework and threads like this are v useful to me! @skaae I had a question about training an encoder-decoder. Is the decoder training like a standard language model? After reading in the entire sequence-1, (then using a repeat-vector or cell state), for each time-step the decoder is given a sample (word in machine translation task) and predicts the next word. is this correct? As you seem to be familiar with Keras, does this correspond to a 'graph' architecture ? (rather than sequential, in eras terminology) where there are 2 inputs (seq-1 and seq-2) and 1 output.
Thanks in advance!
No. The encoder model wil encode your model into some hidden representation. The decode will then somehow use that representation to produce the output. In the simplest case you'll use a RNN as encoder and use the last hidden state of the encoder to initialize the decoder which is also an RNN. Other variants use softmax attention etc.
In the language model you try to predict the next word given the previous words. This can easily be done in Lasagne. Take a look at the language model pull reguest in lasagne recipe's.
You can also have a look at http://karpathy.github.io/2015/05/21/rnn-effectiveness/
I see. So at each time step the decoder predicts the next word using the encoded vector and previos hidden state (which implicitly contains the history of previously predicted words) Why does sutskiver say the decoder is basically a lstm-lm? (He uses that phrase) At test time we need to feed in the previous prediction as input. I didnt get what u meant by a 'class network' ?
Just to be more clear. What is the input to the decoder at each time step (during training)?
Maybe because it predicts the next word given the context?
Yes it is a problem that you need to feed in the previous prediction during test time. Its on my todo list for the language model. Hopefully i'll have time to finish during the weekend.
If you have an encoder decoder you either initialize the decoder with the last hidden state from the encoder or you use a repeat layer to repeat the last hidden state number of decode steps time and use that as input.
Lasagne does not have a repeatLayer now, but theres an issue where where you can find an example.
btw. If you have more question can you post them in the mailing list?
Thank you for the reply. I will definitely have more questions. See you on the mailing list :)