pytorch-fm icon indicating copy to clipboard operation
pytorch-fm copied to clipboard

about "self.offsets" some questions

Open CallmeChenChen opened this issue 5 years ago • 3 comments

Dear DaLao: what's the function of "self.offsets" ?

self.offsets = np.array((0, *np.cumsum(field_dims)[:-1]), dtype=np.long) def forward(self, x): x = x + x.new_tensor(self.offsets).unsqueeze(0)

CallmeChenChen avatar Feb 03 '20 03:02 CallmeChenChen

feature index offset of each field

KwangKa avatar Apr 22 '21 09:04 KwangKa

Dear DaLao: what's the function of "self.offsets" ?

self.offsets = np.array((0, *np.cumsum(field_dims)[:-1]), dtype=np.long) def forward(self, x): x = x + x.new_tensor(self.offsets).unsqueeze(0)

    # e.g. field_dims = [2, 3, 4, 5], offsets = [0, 2, 5, 9]
    # 索引的偏移量
    # 因为所有特征共用一个 Embedding表
    # 所以,实际表中 0~1行  对应 特征 X0, 即 field_dims[0]
    #               2~4行  对应 特征 X1, 即 field_dims[1]
    #               5~8行  对应 特征 X2, 即 field_dims[2]
    #               9~14行 对应 特征 X3, 即 field_dims[3]
    # 但实际特征取值 forward(self, x) 的 x大小 只在自身词表内取值
    # 比如:X1取值0,对应Embedding内行数就是 offsets[X1] + X1 = 2 + 0 = 2

cpy18727 avatar Nov 14 '21 09:11 cpy18727

细节上说的有一点小小的错误

  • 若field_dim = [5,10,5], np.cumsum(field_dims)[:-1]=[5,15], 所以源码中 补了一个零 np.array((0, *np.cumsum(field_dims)[:-1])

tsWen0309 avatar Feb 07 '22 14:02 tsWen0309