torch-rnn icon indicating copy to clipboard operation
torch-rnn copied to clipboard

support UTF-8 start_text

Open maraoz opened this issue 8 years ago • 4 comments

I was trying to run stuff like:

th sample.lua -checkpoint $(ls -t cv/check*.t7 | head -1) -length 200 -temperature 0.75 -start_text "El país necesita"

And kept getting a Got invalid idx error. After digging a while I found that lua doesn't work correctly with utf-8 strings :angry:, so I had to add a package to solve this.

Please review. This PR might be controversial, as it adds a dependency, but I think that given that the main use-case for this lib is character-based learning, it should properly handle UTF-8 inputs both in the input files (https://github.com/jcjohnson/torch-rnn/pull/7) and CLI arguments (this PR).

maraoz avatar Mar 31 '16 07:03 maraoz

Oh, that solves #47 issue I have started! Many thanks! Hope this gets merged into the main project, but for now I'm cloning your fork.

ostrosablin avatar Mar 31 '16 14:03 ostrosablin

Thanks for figuring this one out! It seems like a relatively innocent dependency, so I think I'm onboard. Does this require any change to the decode_string method in the LanguageModel? Will it still work properly if we sample a non-ASCII character?

jcjohnson avatar Mar 31 '16 18:03 jcjohnson

It does! Actually, that worked even without this PR

On Thu, Mar 31, 2016, 14:22 Justin [email protected] wrote:

Thanks for figuring this one out! It seems like a relatively innocent dependency, so I think I'm onboard. Does this require any change to the decode_string method in the LanguageModel? Will it still work properly if we sample a non-ASCII character?

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/jcjohnson/torch-rnn/pull/52#issuecomment-204064364

maraoz avatar Apr 01 '16 00:04 maraoz

Why this comment has not been merged?

drcege avatar Oct 29 '16 14:10 drcege