pycorels
pycorels copied to clipboard
what can be done for unbalanced data: oversampling has strange behavior ?
what can be done for unbalanced data? for example : number of target yes is 200 but number of target no is 500000
Oversampling , meaning replicating records with target yes helps little bit when oversampling used one time So it will be number of target yes is 400 number of target no is 500000
but second replicating surprisingly is not helping , relatively to first replicating
so when it is
number of target yes is 600
number of target no is 500000
then performance is the same as when number of target yes is 400 number of target no is 500000
The questions are: 1 do you remove identical rows? 2 Do you have weighting for particular rows? for example rows with targets yes is may have more influence than rows targets "no" Then rows weighting can be used for unbalanced data?
Thanks