crnn
crnn copied to clipboard
Using UTF-8 Character encoding instead of ASCII
@bgshih The text data that I want to recognize involves currency symbols like dollar sign, euro sign, Indian rupee sign etc. Most of these are not present in the ASCII Character set. I found that they are present in the UTF-8 Character encoding. How can I use UTF-8 encoding instead of ASCII? Which places I need to modify the code related to that? Please help.
the model does not care what the meaning of a class is, therefore each character is just represented by a number. To input a text string into the model, the string has to be translated to a number sequence. For outputting, the opposite has to be done. The translation between text string <-> number sequence can be found in https://github.com/bgshih/crnn/blob/master/src/utilities.lua