facets icon indicating copy to clipboard operation
facets copied to clipboard

Problems with Enumerable#map_by

Open tokland opened this issue 12 years ago • 2 comments

I use Enumerable#map_by a lot, it's a great abstraction because, more often than not, Enumerable#group_by falls too short. However, there are a couple of closely related problems with its implementation and specifications:

  1. Implementation: insufficient checks of the returned value of the block:
[[:a, true], [:b, false], [:c, nil]].map_by { |x, y| [x, y] }
#=> {:a=>[true], :b=>[[:b, false]], :c=>[[:c, nil]]}

Would'n you expect {:a=>[true], :b=>[false], :c=>[nil]}?

  1. Specifications: "if a second value is not returned, #map_by acts like #group_by.". The problem is that you cannot tell an array (which you may intent to use as a key) from a pair [key, value]. You may for example write:
[1, 2, 3].map_by { |x| [x, 2*x] }
#=> {1=>[2], 2=>[4], 3=>[6]}

But what you expected was {[1, 2]=>[1], [2, 4]=>[2], [3, 6=>[3]}. The old Unix adagio probably applies: "do one thing and do it well".


It looks to me that while (1) is easily solvable, (2) suggests that trying to act like group_by was a bad idea after all (if you wanted a group_by shouldn't you be using group_by anyway?). Granted, that change would break some old code (though I am not sure many people were actually using map_by that way...).

I can prepare a pull request with the conclusion of the conversation, I wanted to gather some opinions first.

tokland avatar May 31 '13 15:05 tokland

I agree that it does not need to act like group_by in any case.

ghost avatar May 31 '13 16:05 ghost

First, sorry about the delay on this. I've had to prioritize other projects this last year. But this year I plan to get a out a new release soon and give some extra time to the project.

You are spot on with the first issue. If shouldn't be if v. That is indeed insufficient. I would imagine it needs to be changed such that the result of the block is checked via if Array === r, or something like that.

On the second I am not sure b/c what do you do if a two-element array is not returned from the block? Maybe it should just raise an error? I try to avoid errors and make all inputs produce something useful whenever I can, but certainly sometimes that's not reasonable.

How would you re-implement the method?

trans avatar Jan 31 '14 18:01 trans

On the second I am not sure b/c what do you do if a two-element array is not returned from the block?

There are not many core methods that can be "wrongly" used. One that I know of is Hash[...]. You have to return pairs from the block, but if you don't, it will construct a hash anyway. I think that's ok with the Ruby philosophy, it's the caller's problem to do it right. This also happens with the closely related method Enumerable#mash in this library. So I would do no checks and simply write:

  def map_by
    res = {}
    each do |a|
      k, v = yield(*a)
      (res[k] ||= []) << v
    end
    res
  end

Well, I'd use each_with_object but I guess it's slower.

tokland avatar Jan 31 '14 18:01 tokland