SDV icon indicating copy to clipboard operation
SDV copied to clipboard

Support sample weight in data?

Open 903124 opened this issue 2 years ago • 2 comments

Problem Description

Currently for every models all row are treated equally. Is it possible (at least for some models) that can support sample weight such that some rows are weighted more heavily? For instance it could be helpful for data that has distribution shift over time.

Expected behavior

In model fit method add the possibility to supply a sample weight that equal to number of rows in the data. It's similar to how sample weight is done on sklearn API.

903124 avatar Feb 12 '23 08:02 903124

Hi @903124, thanks filing the feature request. We can keep this open as we think about it more and use it to track any updates.

I'm curious what kind of data you are working with? Have you thought about creating multiple models for old vs. new data?

Workaround

One manual workaround may be to modify your input data. You can duplicate the rows that seem most important, essentially encoding a "weight" in terms of volume.

npatki avatar Feb 17 '23 15:02 npatki

Hi I'm working on sports data where e.g. for weight player performance more recent data would be more valuable. In general I think it would be a useful feature to increase or decrease the occurrence of rows with certain features without using external sampler as mentioned

903124 avatar Feb 17 '23 15:02 903124