elixir
elixir copied to clipboard
Add Enum.sum/2
Proposal draft to include Enum.sum/2.
If we agree on the signature, will send a similar PR for Enum.product/2.
Side note: ~might also bench specific implementations for lists to see if it's worth it.~ Edit: done, shows a 40% increase. Not bad, will update accordingly.
My biggest concern with sum/2 and product/2 is figuring out the API. We have count/2, but it works as a filter (understandably). min and max are also aggregations but the function configures the empty value. sum_by is not a good fit either, as it should behave closer to frequencies_by (which by the way it is not called count_by). So if we add this, each aggregation/2 function behaves slightly different and I am not sure if it will be confusing.
I understand this concern, my thinking was that filtering is still possible by returning the noop value 0 or 1. Maybe we can explicitly provide an example doing so in the docs ?
An explicit alternative could be map_sum/2 (cf map_join/2), which somehow feels less elegant though.
sum_by is not a good fit either, as it should behave closer to frequencies_by
Assuming that frequencies_by would have value_fun just like group_by sum_by wouldn't be so out of place.
3rd time the charm? #11381 and #11455
I've been thinking about this one, and map_sum/2 would actually make a lot of sense if we think in terms of types.
It would be consistent with map_join/2, which has a almost the same signature and is the equivalent aggregation function for strings.
map_join([t], (t -> String.t())) :: String.t()
map_sum([t], (t -> number())) :: number()
Type-wise, _by would be inconsistent (as you mentioned @josevalim) because _by functions usually return the original type, not the "mapped" type (or as keys in a map):
max_by([t], (t -> u)) :: t
uniq_by([t], (t -> u)) :: [t]
Another benefit of map_sum/2 is that the name is pretty clear and leaves no doubt about what it's doing.
I'm personally ok with map_sum/2 as the name, what you just wrote makes sense @sabiwara. Iām not convinced we need this in the first place, though. Also, we'd need map_product/2, just to point that out.
Iām not convinced we need this in the first place, though.
I think it would be good from the perspective of encouraging users to pick efficiency without sacrificing readability.
Most of the cases I've found myself needing sum/1 are like the one in the cheatsheet and require mapping.
Right now one can be tempted to write map(f) |> sum() instead of a single reduce to make the intent clearer, yet this is not slightly but highly more wasteful.
Closing for now since the appropriate solution still needs discussion.
@sabiwara you said closing as it still needs discussion but immediately reopened - accidental double click? :sweat_smile:
@sabiwara you said closing as it still needs discussion but immediately reopened - accidental double click? š
Nope, Jose asked me to keep it open š
Closing this, as we were unable to reach a conclusion.
Learning Elixir right now. The absence of sum/2 is quite sad.
We are doing:
item[:quantity_by_size]
|> Enum.map(fn {_k, v} -> v end)
|> Enum.sum
and the exercise says
Use the Enum.reduce function in the total_quantity function to practice. It's the best fitting Enum function for this task.
Here is what I did:
defp sum_by(enumerable, fun) do
enumerable |> Enum.reduce(0, fn element, acc -> fun.(element) + acc end)
end
item[:quantity_by_size]
|> sum_by(fn {_k, v} -> v end)
So are we doomed to rewrite a sum/2 with a low level tool like reduce each time? Cant we have something nicer and really fitting?
Please let us write
item[:quantity_by_size]
|> Enum.sum(fn {_k, v} -> v end)
I have revisited my previous argument. And I found a flaw in it:
sum_by is not a good fit either, as it should behave closer to frequencies_by (which by the way it is not called count_by).
We already have max_by, dedup_by, chunk_by and many others. And they all behave differently - and they should - because they are customizing different operations.
So if everyone agrees, I am fine with adding this as sum_by and product_by.
EDIT: I believe @sabiwara mentioned this earlier as well.
@josevalim after thinking about this for a while, I was more leaning towards map_sum/2 due to the very close situation with map_join. It could be caught by a linter rule which would be symmetrical with this one, and type signatures are very similar too:
@spec map_join(Enumerable.t(x), x -> String.Chars.t()) :: String.t()
@spec map_sum(Enumerable.t(x), x -> number()) :: number()
OTOH, for a callback of type x -> y, max_by or dedup_by still return a x / list(x), y is used for comparisons.
I am fine either way, sum_by/2 works as well, just wanted to share the rationale above.
- Rebased
- Renamed to
sum_byfor now - Added an entry to the cheatsheet and reworked the docs a bit
Once this is merged, will send an equivalent for product/2