CharCnn_Keras
CharCnn_Keras copied to clipboard
Problem with str_to_indexes in data utils
Shouldn't c = s[i-1] not c = s[-i]? -i is the negative of i, not the index you want.
Shouldn't c = s[i-1] not c = s[-i]? -i is the negative of i, not the index you want.
He's parsing the string backwards:
s = "hey"
# c in loop
c = "y"
c = "e"
c = "h"
and then c is mapped with self.dict giving
str2idx = [25 5 8 0 0 0 0 ...]
# 25 : y
# 5 : e
# 8 : h
I guess the order of the sequence doesn't matter because it's a CNN processing all at once and not an RNN or other architectures in which the order has meaning...
Actually, reading the paper: https://arxiv.org/pdf/1509.01626.pdf the reason for the backwards mapping is given (page 2):
The character quantization order is backward so that the latest reading on characters is always placed near the begin of the output, making it easy for fully connected layers to associate weights with the latest reading.