NumCpp icon indicating copy to clipboard operation
NumCpp copied to clipboard

One Hot Encoding

Open brccabral opened this issue 1 year ago • 1 comments

I am trying to do "one hot encoding". If value is 4, only column 4 should be set to 1, other columns remain 0.

In python I can do this. Each value in y is evaluated in each row.

y = np.array([5, 4, 3, 0, 7, 6, 5, 1, 3, 5])
one_hot = np.zeros((10,10))
one_hot[np.arange(y.size), y] = 1
print(one_hot)

Prints

[[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]  # 5
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]  # 4
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]  # 3
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]  # 0
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]  # 7
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]  # 6
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]  # 5
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]  # 1
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]  # 3
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]] # 5

But in NumCpp all values in y are evaluated for all rows.

nc::NdArray<int> y = {5, 4, 3, 0, 7, 6, 5, 1, 3, 5};
auto one_hot = nc::zeros<int>(10,10);
one_hot.put(nc::arange(y.size()), y, 1);
one_hot.print();

Prints

[[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]]

brccabral avatar Sep 19 '24 03:09 brccabral

Hmm, yeah they are different behaviors. I'll have to add an additional put overload to accomplish this functionality.

dpilger26 avatar Sep 19 '24 16:09 dpilger26

Added to Version 2.13 release.

dpilger26 avatar Jan 03 '25 04:01 dpilger26