DMwR2 icon indicating copy to clipboard operation
DMwR2 copied to clipboard

SMOTE creating floats from integers

Open DandiestSquare1 opened this issue 9 years ago • 1 comments

Thank you for the time and effort you put into DWwR because it is incredibly useful. SMOTE has been especially useful.

Question Do you have a suggestion for how to stop SMOTE from creating floats from columns where all distinct values are all integers? i.e. input integers into SMOTE and SMOTE returns int's and floats.

Problem For example, running the following creates floats from columns containing only integers: dataset.bal <- SMOTE(target ~ ., dataset, perc.over=675, perc.under=100);

Input

#distinct values in column 124 dataset are all integers 
levels(as.factor(dataset[,124]));

[1] "0" "4" "8" "12" "16" "20" "24" "28" "32" "36" "40" "44" "48" "52" "56" "60" "64" "68" "72" "76" "80" "84"

Output

#floats values have been added in column 124 in dataset.bal  
levels(as.factor(dataset.bal[,124]));

[1] "0" "4" "8" "12""16" "17.4798878207803" [7] "17.8020473793149" "20""20.5099524259567" "20.6726439939812" "20.7490051034838" "21.7515262812376" [13] "23.4109582398087" "23.5898735374212" "24""24.0396314188838" "24.0953625235707" "24.148680685088" [19] "24.2740816418082" "24.3701336197555" "24.5072071170434" "24.5087074693292" "24.5124940760434" "24.7718890225515" etc..

DandiestSquare1 avatar Jun 15 '16 23:06 DandiestSquare1

By limiting (clearly telling the data type) the column type in the dataframe, SMOTE generates new values in integers.

It works for me: df['Column_name'] = df['Column_name'].astype(int)

P-Rainbow avatar Dec 28 '20 04:12 P-Rainbow