itertools
itertools copied to clipboard
More generic maps (in `GroupingMap`)
At the moment, this is only a proof of concept, but it would close #588 (and eventually solve some issues, see https://github.com/rust-itertools/itertools/labels/generic-container).
- I wrote a straightforward private trait generalizing both
BTreeMapandHashMap(with any hasher). Maybe too much methods, or not enough. I will adjust it. I don't think we should generalize to more than those two maps but I guess it's possible. - Changed
use_stdtouse_allocwhere possible (becauseBTreeMaponly needsuse_allocwhileHashMapneedsuse_std) and relaxed someEq + Hashbounds. - I added a alternative
aggregate_intoaggregatewhich takes one more argument: any empty map (or one we would.clear()).
The last point is where I struggled the most but that's pretty much the only real possibility because there is so many possibilities of initializations: BTreeMap::new(), HashMap::new(), HashMap::default() for some hashers, and with_capacity shenanigans. And I think it's the most general solution (I can even imagine one single HashMap being used multiple times to avoid reallocations).
This could extend to other areas than just GroupingMap, and a Set trait could be wrote to generalize BTreeSet and HashSet but it's for another PR.
For now, I would like some feedback.
CI currently fails because of temporary unused imports.
Currently, we can do it.into_grouping_map().product(); but we can't have a generic map.
let my_map = HashMap::new();
let my_map = HashMap::with_capacity(100);
let my_map = FxHashMap::default(); // rustc_hash
let mut my_map = FxHashMap::default(); my_map.reserve(100);
let my_map = BTreeMap::new();
But where to give the map? (without breaking change)
1. Initial idea: suffix all GroupingMap methods with _in. But it results in many weird names like minmax_by_key_in.
it.into_grouping_map().product_in(my_map);
// --- ------
For each GroupingMap method like product, old it.into_grouping_map().product(); would delegate to new it.into_grouping_map().product_in(HashMap::new());.
2. Suffix Itertools methods. The trait would be a bit more crowded. But only two redirections.
it.into_grouping_map_in(my_map).product();
// --- ------
Old it.into_grouping_map(); would delegate to new it.into_grouping_map_in(HashMap::new());.
Similar for into_grouping_map_by -> into_grouping_map_by_in but that's it.
That would be similar to other possible extensions like Itertools::counts_in and Itertools::all_unique_in.
3. A bit nested. But no _in suffix.
it.into_grouping_map().with_map(my_map).product();
// -----------------
For each GroupingMap method like product, old it.into_grouping_map().product(); would delegate to new it.into_grouping_map().with_map(HashMap::new()).product();.
Hi @Philippe-Cholet, two thoughts:
- Does
into_grouping_mapplay nicely with type inference? (Currently,GroupingMapreturns maps with different value types, e.g.minmaxreturnsMinMaxResult<V>s, butmaxreturnsVs.) If so, I would go with it (offers a central place to supply a map and generalizes well tocountset al). - I know that for many day-to-day cases, a plain old (possibly sorted)
Vec<(K, V)>can be more efficient than{BTree|Hash}Map. Thus, I think we should ensure thattrait Mapcould support this type.
@phimuemue
I see your performance point about Vec<(K, V)>. It can always be added later (in the next PR, or any release) as we would not break anything by implementing Map for a new type.
Well, silly me...!! I don't know how it could work in a central place (cases 2 and 3 in my last comment).
We would create a new struct GroupingMapIn that GroupingMap would alias, right?
pub type GroupingMap<I> = GroupingMapIn<I, HashMap<_, _>>;... Simply blocked! The first one would be <I as Iterator>::Item.0 (which I can't write) and the second is still unknown since it's the method using it that will determine it...
Then only remain my initial idea (case 1) method_name_in<..., M: Map<Key = ..., Value = ...>>(..., map: M) -> M.
Codecov Report
Attention: Patch coverage is 95.34368% with 21 lines in your changes are missing coverage. Please review.
Project coverage is 94.43%. Comparing base (
6814180) to head (5a71e63). Report is 40 commits behind head on master.
| Files | Patch % | Lines |
|---|---|---|
| src/generic_containers.rs | 54.54% | 20 Missing :warning: |
| src/grouping_map.rs | 99.75% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #901 +/- ##
==========================================
+ Coverage 94.38% 94.43% +0.04%
==========================================
Files 48 49 +1
Lines 6665 7187 +522
==========================================
+ Hits 6291 6787 +496
- Misses 374 400 +26
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@SkiFire13 I see you are the author of GroupingMap. What do you think of such extension?
@SkiFire13
The duplication of the methods is a bit unfortunate though.
I totally agree, I'm gonna investigate your "add two generics" idea.
Could this be extended to
into_group_map[_by]too?
Yes, in the next PR. I intend to extend all HashMap usage in the crate (and HashSet later), most https://github.com/rust-itertools/itertools/labels/generic-container issues should be solved with this Map.
Because I want to make all this work first, I'm gonna delay two of your ideas:
- Optimize
remove+insertinGroupingMap::aggregate_in. impl<M: Map> Map for &mut M.
Because I want to make all this work first, I'm gonna delay two of your ideas:
* Optimize `remove+insert` in `GroupingMap::aggregate_in`. * `impl<M: Map> Map for &mut M`.
Understandable. As long as Map is private this can be changed later, so I'm fine with it.