torch-rnn
torch-rnn copied to clipboard
support UTF-8 start_text
I was trying to run stuff like:
th sample.lua -checkpoint $(ls -t cv/check*.t7 | head -1) -length 200 -temperature 0.75 -start_text "El país necesita"
And kept getting a Got invalid idx
error. After digging a while I found that lua doesn't work correctly with utf-8 strings :angry:, so I had to add a package to solve this.
Please review. This PR might be controversial, as it adds a dependency, but I think that given that the main use-case for this lib is character-based learning, it should properly handle UTF-8 inputs both in the input files (https://github.com/jcjohnson/torch-rnn/pull/7) and CLI arguments (this PR).
Oh, that solves #47 issue I have started! Many thanks! Hope this gets merged into the main project, but for now I'm cloning your fork.
Thanks for figuring this one out! It seems like a relatively innocent dependency, so I think I'm onboard. Does this require any change to the decode_string
method in the LanguageModel? Will it still work properly if we sample a non-ASCII character?
It does! Actually, that worked even without this PR
On Thu, Mar 31, 2016, 14:22 Justin [email protected] wrote:
Thanks for figuring this one out! It seems like a relatively innocent dependency, so I think I'm onboard. Does this require any change to the decode_string method in the LanguageModel? Will it still work properly if we sample a non-ASCII character?
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/jcjohnson/torch-rnn/pull/52#issuecomment-204064364
Why this comment has not been merged?