itertools
itertools copied to clipboard
Kotlin like lazy Grouping API
I'm opening this to request/discuss a possible Grouping API like Kotlin's Grouping
/groupingBy
API. See Kotlin's Grouping reference and its source code.
It would provide an easy way to perform efficient group-and-fold operations without allocating additional Vec
s (unlike into_group_map
), while also maintaning unique keys (unlike group_by
). The disadvantage is that it can only support fold
-like operations.
The implementation should be pretty simple, I've already written a pretty minimal working sample that includes:
- a
fold
function, like a normal fold except it associate each key with the result of folding each element that maps to that key; - a
fold_first
function, likefold
but without an initial value, instead the first element acts as the first value; - a
count
function that associate each key with the number of elements that maps to that key; - a
aggregate
function that provides the basic functionality we can use to write all the other functions
https://gist.github.com/SkiFire13/a0010ff658d905dcbc1e1f5f6ae910e0
The downsides of my implementation are:
- It can only produce an
HashMap
. This could be solved with aMap
trait but we're still waiting for GATs for that. It is required because I don't onlyextend
the map, but alsoremove
elements from it; - It doesn't provide a way to specify the hasher (it could but then when/if we will switch to a
Map
trait then it would break more things); -
fold
's initial value must beClone
but I can't see any way around that; - It could be confused with
group_by
although Kotlin has bothgroupBy
andgroupingBy
and I haven't seen someone complaining; - It pretty much reuses Kotlin's docs and tests but I haven't checked which license the Kotlin stdlib uses;
- It misses code examples from the docs;
-
fold_first
is consistent with the stdlib corrispondent but inconsistent with Itertools'fold1
.
I'm also thinking of other possible extensions with functions like:
- other flavors of
fold
; -
max
/min
/maxmin
and friends; -
sum
but I found it would be pretty hard to use theSum
trait since it requires a whole iterator andGrouping
can only fold.
Hi there! How easy would it be to express these operations in terms of simpler iterator operations?
From what I've tried, there may be some value in these functions, but wouldn't it be easier (and more in line with existing methods) to generalize into_group_map
to into_group_map_by
(or similar) so that the caller is not forced to go with the Vec
s?
I would imagine something where the caller could customize key
, Vec::new
, push
in the following: https://github.com/rust-itertools/itertools/blob/710d9f248b50a70adbfbff5824b5710d6d315d7a/src/group_map.rs#L18
Of course, the new method would not require Item=(K, V)
since it could compute they key (and, thus, the key type) itself.
Yes, it would be possible. What you're proposing is essentially merging the GroupingByExt::grouping_by
and Grouping::aggregate
/fold
/fold_first
methods of my example, which is 100% possible from a technical point of view. This would also save a method call, but would keep the total arguments needed invariant, which could be become confusing with 3 arguments on a single call, 2 of which are closures.
The biggest problem of your approach IMO is that we would have to choose between implementing it like fold
or fold_first
/fold1
. And what about other useful methods like count
, max
ecc ecc? I don't think we can find suitable names while maintaing them in line with the existing methods. And ever if we found them, we would pollute the Itertools
trait with too many methods. We could require the end users to write them themself when needed, which I find non-optimal and also against the crate's idea of providing convenience methods.
However, I agree we could find a name more in line with the existing ones. into_group_map_by
could be an option, but if we go for the separate methods way then it would conflict with the expectation that it directly produces a HashMap
. I like the grouping_by
name because it conveys the meaning that the second operation is done while grouping, and doesn't group right away like one would expect from a function that starts with into_group_
.
Edit: formatting
@SkiFire13 I see this issue and I'm wondering if there is anything left to do after #465 or if I can close this.
I think the APIs introduced in #465 should be enough, if someone needs more they can always open a new issue. Sorry for forgetting to close the issue with that PR.
By the way, I wonder if #309 (the other issue I linked in that PR) is also solved.