crystal icon indicating copy to clipboard operation
crystal copied to clipboard

add group_by_with_index and split_array

Open orangeSi opened this issue 3 years ago • 1 comments

If add this to src/enumerable.cr:

  # split array to n parts
  #
  # ```
  # a = (0..10).to_a
  # a.split_array(100) # => [[0], [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]]
  # a.split_array(3) # => [[0, 1, 2], [3, 4, 5], [6, 7, 8, 9, 10]]
  # a.split_array(5) # => [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9, 10]]
  # ```
  def split_array1(n : Int) : Array(Array(T))
    raise ArgumentError.new("Can't split arrat to negative number of parts") if n < 0
    
    block_size = size//n
    block_size = 1 if block_size == 0
    a = group_by_with_index { |e, i| i//block_size } .values # [[1,2], [3,4], [5]]
    if a.size == n+ 1
        a[..-3] << (a[-2] + a[-1])
    else
        a
    end
  end

  def split_array2(n : Int) : Array(Array(T)) 
    raise ArgumentError.new("Can't split arrat to negative number of parts") if n < 0
    
    block_size = size//n
    block_size = 1 if block_size == 0
    ary = Array(Array(T)).new(n)
    return ary if n == 0

    each_slice(block_size) do |slice|
        ary << slice    
    end
    if ary.size == n+ 1
        ary[..-3] << (ary[-2] + ary[-1])
    else
        ary
    end
  end

then I do this bechmark for split_array1(use group_by_with_index ) and split_array2(use each_slice):

require "benchmark"

a = (0..100000).to_a

Benchmark.ips do |x|

  x.report("split_array1:") do
    a.split_array1(3)
    a.split_array1(5)
  end

  x.report("split_array2:") do
    a.split_array2(3)
    a.split_array2(5)
  end

end

aftercrystal build --release got this:

split_array1: 184.69  (  5.41ms) (± 3.13%)  3.95MB/op   5.23× slower
split_array2: 965.49  (  1.04ms) (± 4.01%)  1.17MB/op        fastest

So I use each_slice instead of group_by_with_index for split_array!

orangeSi avatar Sep 16 '22 09:09 orangeSi

Hi @orangeSi ! I would suggest opening an issue first to discuss this addition.

My thoughts:

  • I believe split_array(n) is the same as in_group_of(size / n) (more or less)
  • I'm not sure adding more _with_index methods is a good idea. It's rare. And, if needed, you can keep track of an index by using a local variable.

asterite avatar Sep 16 '22 12:09 asterite