recommenders icon indicating copy to clipboard operation
recommenders copied to clipboard

[ASK] the encode of the feature in field

Open ucasiggcas opened this issue 4 years ago • 2 comments

hi,dear

Description

I'm a little confused that the Class for encoding the feature in the field, I see the example,

df_feature_original = pd.DataFrame({
    'rating': [1, 0, 0, 1, 1],
    'field1': ['xxx1', 'xxx2', 'xxx4', 'xxx4', 'xxx4'],
    'field2': [3, 4, 5, 6, 7],
    'field3': [1.0, 2.0, 3.0, 4.0, 5.0],
    'field4': ['1', '2', '3', '4', '5']
})
converter = LibffmConverter().fit(df_feature_original, col_rating='rating')
df_out = converter.transform(df_feature_original)
df_out

 | rating | field1 | field2 | field3 | field4

0 | 1 | 1:1:1 | 2:4:3 | 3:5:1.0 | 4:6:1
1 | 0 | 1:2:1 | 2:4:4 | 3:5:2.0 | 4:7:1
2 | 0 | 1:3:1 | 2:4:5 | 3:5:3.0 | 4:8:1
3 | 1 | 1:3:1 | 2:4:6 | 3:5:4.0 | 4:9:1
4 | 1 | 1:3:1 | 2:4:7 | 3:5:5.0 | 4:10:1

I found the number of the feature is increasing in a field,then encode the features in another field , but that's not same with the FFM author down

Click  Advertiser  Publisher
=====  ==========  =========
    0        Nike        CNN
    1        ESPN        BBC
Then, you can generate FFM format data:
    0 0:0:1 1:1:1
    1 0:2:1 1:3:1

he encodes the features in a example and then another example , so the method in the rp is different, does the difference affect the results ??

Other Comments

maybe my poor English could not be understood,Now is the Chinese Time down 这里的rp编码规则: 先对一个field内的 feature进行编码,然后再对另一个field内的feature进行编码 而libffm的编码: 对一条数据的所有fields内的features进行编码,然后下一条数据, 这两种feature编码规则会影响最终的结果吗?

多谢大佬

waiting for your kind reply ! thx

ucasiggcas avatar Jun 01 '20 09:06 ucasiggcas

@ucasiggcas, I believe there are two minor differences

  1. we start encoded at 1 instead of 0
  2. we increment the feature count by looping over all examples in each field before moving to the next field vs looping over all features in each example before moving to the next example.

while the encoded numbers may be different I would not expect them to have any bearing on training with the ffm algorithm. @yueguoguo do you see any concern here?

gramhagen avatar Jun 01 '20 21:06 gramhagen

hi,dear have you tried the movielens-1M data with the FFM method ? if you could read Chinese,you can see my test here

thx

ucasiggcas avatar Jun 02 '20 02:06 ucasiggcas