DeepCTR icon indicating copy to clipboard operation
DeepCTR copied to clipboard

embedd+hash+mask 的疑问

Open dulm opened this issue 5 years ago • 2 comments

num_buckets = embedd.input_dim = feat.dimension hash_x = tf.strings.to_hash_bucket_fast(x, self.num_buckets)
hash_x = (hash_x + 1) * mask # with mask

max(hash_x)= (feat.dimension-1)+1 = feat.dimension embedd.input_dim requires value∈[0,feat.dimension)

∴ hash结果 作为input, 不满足 embedd的value要求 这几步推断哪步有问题? 抑或是代码有bug, 需改为 embedd.input_dim = feat.dimension +1 if mask ?

dulm avatar Aug 08 '19 13:08 dulm

额, 正好看到这条issue, 估计就是bug https://github.com/shenweichen/DeepCTR/issues/116

dulm avatar Aug 08 '19 13:08 dulm

还有class Hash中: mask_zero=False时, 最终hash值为[0, num_buckets - 1) 但hash值都是用于在emb中look_up的, emb是确定的[0, num_buckets). 相当于浪费了emb[num_buckets - 1]一个槽空着.

dulm avatar Aug 20 '19 07:08 dulm