daru
daru copied to clipboard
Implement a pandas.get_dummies equivalent for daru
Please implement a method like pandas.get_dummies for daru.
Considering the following DataFrame:
color,dog
brown,1
black and white,0
brown,1
...
Our get_dummies implementation should output something like:
color_brown,color_black_and_white,dog
1,0,1
0,1,0
1,0,1
I am new here. Can I give this a try?
Sure. Let us know if you run into difficulties.
Hey, are you @PetalsOnWind still working on this?
I used rumale gem to do (something like) this, here is my code (maybe it helps).
It expects input vector to have only int values, so it's needed to add convertor of unique non-numerical values to numerical to use.
def one_hot_encode_vector(data_frame, vector_name:, delete: false, name: nil)
vector_name = vector_name.to_sym
encoder = Rumale::Preprocessing::OneHotEncoder.new
labels = Numo::Int32[data_frame[vector_name].to_a].flatten
one_hot_vectors = encoder.fit_transform(labels)
name = vector_name.to_s unless name.present?
transposed_one_hot_vectors = one_hot_vectors.to_a.transpose
data_frame[vector_name].sort.uniq.to_a.each_with_index do |value, i|
vector_name = "#{name}_encoded_#{value}".to_sym
data_frame[vector_name] = transposed_one_hot_vectors[i] unless i.nil?
end
if delete
data_frame.delete_vector(vector_name)
end
data_frame
end