lua---nnx
lua---nnx copied to clipboard
nnx example
Hi, am very new with nnx and i was wondering if there's a simple example of training and testing an RNN with nnx library like it's the case with the nn library for a feed forward neural network?
Hi @manamir you can use the partial example included in the doc : https://github.com/clementfarabet/lua---nnx#nnx.Recurrent .
There is also a complete working example in https://github.com/nicholas-leonard/dp/blob/master/examples/recurrentlanguagemodel.lua .
If you have any questions, don't hesitate to ask.
Hi @nicholas-leonard and thanks again for the very useful links. Actually, i ran the recurrentlanguagemodel.lua script and it works fine. What i want to do now is to apply the script to my own data. As i understood, there's a way to represent data (word_freq, word_map, word_tree, etc.). So i was wondering if starting from a text data, there's a script to generate all the needed files (word_freq, word_map, word_tree, etc.) or should i create each file myself separately?
@manamir What is your dataset? How many unique words in vocabulary?
The most important thing is to create the train_data.th7, valid_data.th7 and test_data.th7 files as they contain the actual words. Each of these is a tensor encapsulated by a SentenceSet.
hi @nicholas-leonard
My dataset is the PenTreeBank,
i guess i should create a script called PenTreeBank.lua in the same way as billionwords.lua and modify the recurrentlanguagemodel.lua script so that it takes into account PenTreeBand data right?
is train_data.th7 a binary file of a given train_data which is a raw text?
am a bit confused.
i have a raw training data and am still wondering how to convert it into .th7 file ?
ok as i understood i have to parse the raw data and assign to each sentence a corresponding id and than create train.th7 valid.th7 test.th7 and word_map.th7 corresponding to the sentence and words ids
Yeah. I am actually looking to add this dataset to dp. Do you intend to submit a Pull Request for PenTreeBank?
yes i think so
Awesome, if you need help finishing it. Just send a Pull Request with what you have. I can help you with the rest.
Nicholas Léonard 450-210-1214
On Tue, Mar 17, 2015 at 10:39 AM, manamir [email protected] wrote:
yes i think so
Reply to this email directly or view it on GitHub https://github.com/clementfarabet/lua---nnx/issues/29#issuecomment-82387832 .
couldn't figure out how raw data is stored in .th7 file
First, load data into a torch tensor of size N words x 2, i.e. a torch.tensor with 2 columns. First col is for storing start indices of sentences. Second col is for storing the sequence of words as shuffled sentences. Sentences are only seperated by the sentence_end
delimiter, which is a unique integer of your choosing. So for example sentence A : {1,3,5,6} and sentence B {9,3,1,3}, using sentence delimiter 77, would be stored as :
torch.Tensor({{1,1},{1,3},{1,5},{1,6},{1,77},{6,9},{6,3},{6,1},{6,3},{6,77}})
1 1
1 3
1 5
1 6
1 77
6 9
6 3
6 1
6 3
6 77
[torch.DoubleTensor of dimension 10x2]
Then you save that tensor to disk using torch.save("path/to/file.th7", mytensor
.
Thanks a lot, it works perfectly :)
Now am facing a memory size problem... the tensor size is ; (number of sentences x number of words x 2) this is huge
may be i should concatenate different tensor
It should be a tensor of size 2 x total number of words. Not to be confused with 2 x number of unique words.
yes right, now it works fine, am actually dealing with the softmaxtree
The SoftMaxTree is optional. I don't think you would need it for Penn Tree Bank since the vocabulary is so small. Unless you have word hierarchy for Penn Tree Bank lying around, you can just use SoftMax.
right, SoftMax is fine.
is it possible to define another loss function than NLL?
oh sorry for asking when the answer is pretty clear, i gues i should create a new class or add in the NLL.lua file a new loss function :)
Yes :)
Nicholas Léonard 450-210-1214
On Mon, Mar 23, 2015 at 6:08 AM, manamir [email protected] wrote:
oh sorry for asking when the answer is pretty clear, i gues i should create a new class or add in the NLL.lua file a new loss function :)
Reply to this email directly or view it on GitHub https://github.com/clementfarabet/lua---nnx/issues/29#issuecomment-84927359 .
hi Nicolas, do you have any clue about using MSECriterion or KLDivergenceCriterion ? for both i have this error : opt/tools/torch7/install/share/lua/5.1/dp/loss/loss.lua:82: inconsistent tensor size
Can you push your code into a github repo so I can take a look at the issue?
ok i'll do that, the thing is that the only difference with the recurrentlanguagemodel.lua script is that i replaced "loss = opt.softmaxtree and dp.TreeNLL() or dp.NLL()," by "loss = dp.KLDivergence(),"
If you are using SoftMaxTree, you need dp.TreeNLL. If you are using SoftMax, you need dp.NLL. dp.KLDivergence is for fitting probability distributions (i.e. your targets are a continuous probability distribution : something like a 2D torch.DoubleTensor.), but your targets right now are class indices (i.e. a 1D torch.IntTensor).
Did you manage to wrap the Penn Tree Bank dataset? If so, could you Pull Request or just put your code online? Otherwise, that is alright, I can build my own wrapper.
i just added RNN.lua and penntreebank.lua files
here https://gist.github.com/manamir/ac31cedb32ff24db1796
Awesome! Thanks. If you are using dp.Neural with nn.SoftMax for your output layer, you probably need dp.NLL. Not sure what dp.CE is.
oh CE stands for CrossEntropy, i'll add it to the git