xlearn icon indicating copy to clipboard operation
xlearn copied to clipboard

Possibility to assign weights to unbalanced datasets.

Open thestick613 opened this issue 4 years ago • 1 comments

(Y) Click (X) City
1 London
0 Paris
0 Paris
0 Paris
0 Berlin
0 Berlin
0 Rome
0 Rome

Any evaluation metric function would favour a model which generates only 0's, because London is undersampled. An elegant solution would be to provide a weights array, the same size as Y, filled with 1.0 by default. The user can change the weight of unbalanced items in the dataset to be higher, so matching London will weight more than matching a 0 for Paris (we are interested in clicks, we can afford matching some non-clicks as clicks, as long as we find all the clickers). This is generally solved with oversampling clicks or undersampling non-clicks. I found this in approach in tffm / Weighted Loss Function, where it works pretty nice. This would solve some issues, such as #280, #281 and #105, and what looks like this WIP PR.

thestick613 avatar Jan 21 '20 21:01 thestick613

good questoin

Sandy4321 avatar Apr 24 '20 16:04 Sandy4321