crnn icon indicating copy to clipboard operation
crnn copied to clipboard

Using UTF-8 Character encoding instead of ASCII

Open rayush7 opened this issue 7 years ago • 1 comments

@bgshih The text data that I want to recognize involves currency symbols like dollar sign, euro sign, Indian rupee sign etc. Most of these are not present in the ASCII Character set. I found that they are present in the UTF-8 Character encoding. How can I use UTF-8 encoding instead of ASCII? Which places I need to modify the code related to that? Please help.

rayush7 avatar May 18 '17 09:05 rayush7

the model does not care what the meaning of a class is, therefore each character is just represented by a number. To input a text string into the model, the string has to be translated to a number sequence. For outputting, the opposite has to be done. The translation between text string <-> number sequence can be found in https://github.com/bgshih/crnn/blob/master/src/utilities.lua

githubharald avatar May 27 '17 16:05 githubharald