pycorels icon indicating copy to clipboard operation
pycorels copied to clipboard

what can be done for unbalanced data: oversampling has strange behavior ?

Open Sandy4321 opened this issue 5 years ago • 0 comments

what can be done for unbalanced data? for example : number of target yes is 200 but number of target no is 500000

Oversampling , meaning replicating records with target yes helps little bit when oversampling used one time So it will be number of target yes is 400 number of target no is 500000

but second replicating surprisingly is not helping , relatively to first replicating

so when it is
number of target yes is 600 number of target no is 500000

then performance is the same as when number of target yes is 400 number of target no is 500000

The questions are: 1 do you remove identical rows? 2 Do you have weighting for particular rows? for example rows with targets yes is may have more influence than rows targets "no" Then rows weighting can be used for unbalanced data?

Thanks

Sandy4321 avatar Jan 15 '20 21:01 Sandy4321