NumCpp
NumCpp copied to clipboard
One Hot Encoding
I am trying to do "one hot encoding". If value is 4, only column 4 should be set to 1, other columns remain 0.
In python I can do this. Each value in y is evaluated in each row.
y = np.array([5, 4, 3, 0, 7, 6, 5, 1, 3, 5])
one_hot = np.zeros((10,10))
one_hot[np.arange(y.size), y] = 1
print(one_hot)
Prints
[[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] # 5
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.] # 4
[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.] # 3
[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.] # 0
[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.] # 7
[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.] # 6
[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] # 5
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.] # 1
[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.] # 3
[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]] # 5
But in NumCpp all values in y are evaluated for all rows.
nc::NdArray<int> y = {5, 4, 3, 0, 7, 6, 5, 1, 3, 5};
auto one_hot = nc::zeros<int>(10,10);
one_hot.put(nc::arange(y.size()), y, 1);
one_hot.print();
Prints
[[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]]
Hmm, yeah they are different behaviors. I'll have to add an additional put overload to accomplish this functionality.
Added to Version 2.13 release.