CharCnn_Keras icon indicating copy to clipboard operation
CharCnn_Keras copied to clipboard

Problem with str_to_indexes in data utils

Open Shane-Neeley opened this issue 5 years ago • 2 comments

Shouldn't c = s[i-1] not c = s[-i]? -i is the negative of i, not the index you want.

Shane-Neeley avatar Oct 02 '19 18:10 Shane-Neeley

Shouldn't c = s[i-1] not c = s[-i]? -i is the negative of i, not the index you want.

He's parsing the string backwards:

s = "hey"
# c in loop
  c = "y"
  c = "e"
  c = "h"

and then c is mapped with self.dict giving

str2idx = [25 5 8 0 0 0 0 ...]
# 25 : y
# 5 : e
# 8 : h

I guess the order of the sequence doesn't matter because it's a CNN processing all at once and not an RNN or other architectures in which the order has meaning...

fratambot avatar Apr 11 '22 14:04 fratambot

Actually, reading the paper: https://arxiv.org/pdf/1509.01626.pdf the reason for the backwards mapping is given (page 2):

The character quantization order is backward so that the latest reading on characters is always placed near the begin of the output, making it easy for fully connected layers to associate weights with the latest reading.

fratambot avatar Apr 12 '22 09:04 fratambot