skutil
skutil copied to clipboard
Output dataframe with SafeLabelEncoder?
Hey guys, any tips on how to output a dataframe instead of an array when using SafeLabelEncoder()?
This works for me, but I was really hoping to have an argument similar to as_df=True so I can stay in Pandas-land.
train = pd.DataFrame.from_records(data=np.array([
['USA','RED','a'],
['MEX','GRN','b'],
['FRA','RED','b']]),
columns=['Country','Color','Category'])
test = pd.DataFrame.from_records(data=np.array([
['BBR','RED','a'],
['CAN','BLK','b'],
['FRA','RED','b']]),
columns=['Country','Color','Category'])
COLS = ['Country']
# learn the levels on 'Country'
SLC = SafeLabelEncoder().fit(train[COLS].values.ravel())
# create dummies in the train dataset
train_labels = SLC.transform(train[COLS].values.ravel())
test_labels = SLC.transform(test[COLS].values.ravel())
print(train_labels)
print(test_labels)
[2 1 0]
[99999 99999 0]