elixir icon indicating copy to clipboard operation
elixir copied to clipboard

Add Enum.sum/2

Open sabiwara opened this issue 2 years ago • 10 comments

Proposal draft to include Enum.sum/2.

If we agree on the signature, will send a similar PR for Enum.product/2.

Side note: ~might also bench specific implementations for lists to see if it's worth it.~ Edit: done, shows a 40% increase. Not bad, will update accordingly.

sabiwara avatar Aug 22 '23 14:08 sabiwara

My biggest concern with sum/2 and product/2 is figuring out the API. We have count/2, but it works as a filter (understandably). min and max are also aggregations but the function configures the empty value. sum_by is not a good fit either, as it should behave closer to frequencies_by (which by the way it is not called count_by). So if we add this, each aggregation/2 function behaves slightly different and I am not sure if it will be confusing.

josevalim avatar Aug 23 '23 08:08 josevalim

I understand this concern, my thinking was that filtering is still possible by returning the noop value 0 or 1. Maybe we can explicitly provide an example doing so in the docs ?

An explicit alternative could be map_sum/2 (cf map_join/2), which somehow feels less elegant though.

sabiwara avatar Aug 23 '23 09:08 sabiwara

sum_by is not a good fit either, as it should behave closer to frequencies_by

Assuming that frequencies_by would have value_fun just like group_by sum_by wouldn't be so out of place.

image

kaaboaye avatar Aug 28 '23 17:08 kaaboaye

3rd time the charm? #11381 and #11455

hauleth avatar Aug 31 '23 20:08 hauleth

I've been thinking about this one, and map_sum/2 would actually make a lot of sense if we think in terms of types.

It would be consistent with map_join/2, which has a almost the same signature and is the equivalent aggregation function for strings.

map_join([t], (t -> String.t())) :: String.t()
map_sum([t], (t -> number())) :: number()

Type-wise, _by would be inconsistent (as you mentioned @josevalim) because _by functions usually return the original type, not the "mapped" type (or as keys in a map):

max_by([t], (t -> u)) :: t
uniq_by([t], (t -> u)) :: [t]

Another benefit of map_sum/2 is that the name is pretty clear and leaves no doubt about what it's doing.

sabiwara avatar Sep 18 '23 08:09 sabiwara

I'm personally ok with map_sum/2 as the name, what you just wrote makes sense @sabiwara. I’m not convinced we need this in the first place, though. Also, we'd need map_product/2, just to point that out.

whatyouhide avatar Sep 18 '23 09:09 whatyouhide

I’m not convinced we need this in the first place, though.

I think it would be good from the perspective of encouraging users to pick efficiency without sacrificing readability. Most of the cases I've found myself needing sum/1 are like the one in the cheatsheet and require mapping. Right now one can be tempted to write map(f) |> sum() instead of a single reduce to make the intent clearer, yet this is not slightly but highly more wasteful.

sabiwara avatar Sep 18 '23 09:09 sabiwara

Closing for now since the appropriate solution still needs discussion.

sabiwara avatar Sep 18 '23 10:09 sabiwara

@sabiwara you said closing as it still needs discussion but immediately reopened - accidental double click? :sweat_smile:

PragTob avatar Dec 13 '23 13:12 PragTob

@sabiwara you said closing as it still needs discussion but immediately reopened - accidental double click? šŸ˜…

Nope, Jose asked me to keep it open šŸ˜‰

sabiwara avatar Dec 13 '23 13:12 sabiwara

Closing this, as we were unable to reach a conclusion.

josevalim avatar Mar 22 '24 18:03 josevalim

Learning Elixir right now. The absence of sum/2 is quite sad.

We are doing:

      item[:quantity_by_size]
      |> Enum.map(fn {_k, v} -> v end)
      |> Enum.sum

and the exercise says

Use the Enum.reduce function in the total_quantity function to practice. It's the best fitting Enum function for this task.

Here is what I did:

  defp sum_by(enumerable, fun) do
    enumerable |> Enum.reduce(0, fn element, acc -> fun.(element) + acc end)
  end


item[:quantity_by_size]
 |> sum_by(fn {_k, v} -> v end)

So are we doomed to rewrite a sum/2 with a low level tool like reduce each time? Cant we have something nicer and really fitting?

Please let us write

      item[:quantity_by_size]
      |> Enum.sum(fn {_k, v} -> v end)

FYI: Both C# has sum/2 and F# has sum/2 as sumBy

aloisdg avatar Jun 21 '24 11:06 aloisdg

I have revisited my previous argument. And I found a flaw in it:

sum_by is not a good fit either, as it should behave closer to frequencies_by (which by the way it is not called count_by).

We already have max_by, dedup_by, chunk_by and many others. And they all behave differently - and they should - because they are customizing different operations.

So if everyone agrees, I am fine with adding this as sum_by and product_by.

EDIT: I believe @sabiwara mentioned this earlier as well.

josevalim avatar Jun 21 '24 11:06 josevalim

@josevalim after thinking about this for a while, I was more leaning towards map_sum/2 due to the very close situation with map_join. It could be caught by a linter rule which would be symmetrical with this one, and type signatures are very similar too:

@spec map_join(Enumerable.t(x), x -> String.Chars.t()) :: String.t()
@spec map_sum(Enumerable.t(x), x -> number()) :: number()

OTOH, for a callback of type x -> y, max_by or dedup_by still return a x / list(x), y is used for comparisons.

I am fine either way, sum_by/2 works as well, just wanted to share the rationale above.

sabiwara avatar Jun 21 '24 14:06 sabiwara

  • Rebased
  • Renamed to sum_by for now
  • Added an entry to the cheatsheet and reworked the docs a bit
Screenshot 2024-06-21 at 23 52 03

Once this is merged, will send an equivalent for product/2

sabiwara avatar Jun 21 '24 14:06 sabiwara