daru icon indicating copy to clipboard operation
daru copied to clipboard

Implement a pandas.get_dummies equivalent for daru

Open willianveiga opened this issue 7 years ago • 3 comments

Please implement a method like pandas.get_dummies for daru.

Considering the following DataFrame:

color,dog
brown,1
black and white,0
brown,1
...

Our get_dummies implementation should output something like:

color_brown,color_black_and_white,dog
1,0,1
0,1,0
1,0,1

willianveiga avatar Dec 04 '18 14:12 willianveiga

I am new here. Can I give this a try?

PetalsOnWind avatar Dec 13 '18 17:12 PetalsOnWind

Sure. Let us know if you run into difficulties.

v0dro avatar Dec 14 '18 14:12 v0dro

Hey, are you @PetalsOnWind still working on this?

I used rumale gem to do (something like) this, here is my code (maybe it helps). It expects input vector to have only int values, so it's needed to add convertor of unique non-numerical values to numerical to use.

    def one_hot_encode_vector(data_frame, vector_name:, delete: false, name: nil)
      vector_name = vector_name.to_sym
      encoder = Rumale::Preprocessing::OneHotEncoder.new
      labels = Numo::Int32[data_frame[vector_name].to_a].flatten
      one_hot_vectors = encoder.fit_transform(labels)

      name = vector_name.to_s unless name.present?

      transposed_one_hot_vectors = one_hot_vectors.to_a.transpose

      data_frame[vector_name].sort.uniq.to_a.each_with_index do |value, i|
        vector_name = "#{name}_encoded_#{value}".to_sym
        data_frame[vector_name] = transposed_one_hot_vectors[i] unless i.nil?
      end

      if delete
        data_frame.delete_vector(vector_name)
      end

      data_frame
    end

janpeterka avatar Mar 02 '20 08:03 janpeterka