xlearn icon indicating copy to clipboard operation
xlearn copied to clipboard

Weighted data support

Open timofeev1995 opened this issue 4 years ago • 2 comments

Hello. Thank for such a great library! Are you going to add weighted input for training?

timofeev1995 avatar Jul 16 '19 07:07 timofeev1995

@timofeev1995 Hi, I think that weighted input is not very hard to be added in xLearn. Could you please give me a more detailed description of this feature?

aksnzhy avatar Jul 17 '19 06:07 aksnzhy

Thank for your response! Imagine, that you have data for CTR prediction problem like:

feature1 feature2 ... clicks impressions

You may have a hundreds and thousands (or even more) of impressions for certain feature set (for example, feature1 = 0, feature2 = 1 ... etc). So if you want to use FFM with libffm data format, you have to transform one row described above into thousand of identical lines like

feature1 feature2 ... clicks impressions
0 1 ... 1 1
... ... ... ... ...
0 1 ... 1 1

where clicks = 1 is your target

and thousands of identical lines like

feature1 feature2 ... clicks impressions
0 1 ... 0 1
... ... ... ... ...
0 1 ... 0 1

for lines where click had not happened.

Instead of this (for example, liblinear has this option) you can put weights into your loss function and parameters updating, and use dataset like this only with 2 lines:

feature1 feature2 ... clicks impressions
0 1 ... 0 x
0 1 ... 1 y

where x and y are exactly weights im talking about.

timofeev1995 avatar Jul 17 '19 08:07 timofeev1995