char-rnn icon indicating copy to clipboard operation
char-rnn copied to clipboard

How to use this code to model word-level RNN.

Open ankitp94 opened this issue 8 years ago • 12 comments

I was curious as to how to use this existing beautiful code to model sentences as sequences of words, i.e. words as the unit instead of characters.

Can anyone help/guide me in achieving this ?

If there is any other library( or existing code) which does the same, that will also be helpful.

ankitp94 avatar Oct 10 '15 12:10 ankitp94

From what I can see char-rnn learns characters are grouped into "words" in from a relatively small amount of sample text (about 1 MB). Which is one reason I think it is great. It gives us the option to learn / achieve everything from scratch, from a training file.

Shame it can't load it's own brain files and write it's children.

wrapperband avatar Oct 10 '15 15:10 wrapperband

Just wrote a mod of this that works at the word level:

https://github.com/mtanana/char-rnn/

Let me know if you have any trouble with it. I'll be improving this over the coming months.

I think right now the literature shows that the word level works better if that is your problem (learning a language model) but the char-level is better for things that aren't just words (html). This could definitely change, but it is nice to be able to test your model both ways easily.

mtanana avatar Oct 25 '15 18:10 mtanana

That's a neat word pre-processing function.
Can you train on characters, then switch to words? Wouldn't it be possible to "train" the character net to do the word processing? Would that be a good way forward, to make a "generic system" that's "deep" learns it's self. i.e instead of preprocessing have pre-training ...

Also, the single character processing could be treated as a special case of the multicharacter processing, where (say) -init_char_Proc 1 (just so later there could be an option to self align optimum word training length automatically) there would be just one SplitLMMinibatchLoader to maintain

wrapperband avatar Oct 26 '15 09:10 wrapperband

I think you have some great ideas there- but the mod I wrote was probably a bit simpler. It just builds a vocabulary and detects words as discrete features. It's the traditional word based LSTM model, but we get all of the nice additions of this library (saving, sampling). The mod adds some speedups that are necessary with the new, larger vocabulary. (using nn.LookupTable as the input)

Unfortunately, you can't switch between words and characters. They are really different models. But this makes it so you can try both of them on your problem very easily. (using -wordlevel 1)

mtanana avatar Oct 27 '15 06:10 mtanana

Just want to say thanks, will test it on my problems / experiments.

note it says Use the -usewords 1 in the your docs (not -wordlevel 1)

wrapperband avatar Oct 28 '15 13:10 wrapperband

Thanks for noticing that! I fixed it...

Drop me a line if you notice any bugs. I've been committing in a haphazard fashion....

mtanana avatar Oct 29 '15 15:10 mtanana

See https://github.com/karpathy/char-rnn/issues/16

Atcold avatar Dec 04 '15 21:12 Atcold

@mtanana I get an error when specifying 'primetext'

/home/ubuntu/torch-distro/install/bin/luajit: bad argument #1 to '?' (empty tensor at /home/ubuntu/torch-distro/pkg/torch/generic/Tensor.c:851)
stack traceback:
[C]: at 0x7f17c4c8e1b0
[C]: in function '__index' sample.lua:121: in main chunk
[C]: in function 'dofile' ...rch-distro/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x00406670

Also, have you considered incorporating http://nlp.stanford.edu/projects/glove/ similar to https://github.com/larspars/word-rnn?

jroakes avatar Dec 06 '15 17:12 jroakes

@jroakes I think I've had this error before. In the version I have up on github, I don't think I checked for out of vocabulary words....there are a couple of other things it might be...let me check into this and see what I find....I'll try and drop you a line in the next week. (Would be helpful if I knew if any primetext worked or not...is it just certain ones?) I might do a commit soon and see if that helps as well.

On the glove stuff: yeah, there is definitely potential for pre-initializing with GLOVE. This would just be a bit of a mechanical step (i.e. someone would have to write the import step). You'd have to make sure you had one more linear layer that enforced the dimensionality of the word vectors With this model, you are essentially learning a form of word vectors in the nn.LookupTable, but you have a very different objective function and model. But I would suspect those word vectors would help a bit, especially on smaller corpora. Another option is to pre-train your language model on the same corpus used by GLOVE. Then you would be pre-training to the objective function/model that you are currently using (it would just take a bit of time...but compute time and not programming time!)

mtanana avatar Dec 06 '15 18:12 mtanana

@mtanana It worked on -primetext a, but not on -primetext the (perhaps a character / word) issue.

jroakes avatar Dec 06 '15 20:12 jroakes

@jroakes

Yes- it is coming back to me now. The primetext did not work in that implementation. I just did a commit (watch out- I moved a ton of things around, but I was out off sync when I did it, so it looks like things are deleted and created). Might just want to do a clean pull. I have been using it for some dialogue based things. This file will sample interactively: https://github.com/mtanana/char-rnn/blob/master/src/samplegeneral.lua

You might want to go through and delete the places where I insert the (end of speaker) tags if you are not using it for dialogue.

A lot of the code has started to get tailored to my specific use case, so sorry if you find any gremlins. =) Let me know and I'll try and clean them up.

mtanana avatar Dec 07 '15 05:12 mtanana

Thanks. I just reviewed your code and it was very helpful and easy to trace.

jroakes avatar Dec 08 '15 02:12 jroakes