itertools icon indicating copy to clipboard operation
itertools copied to clipboard

question: how it will look to use iter tools to merge sum 2 HashMap<K: Hash + Eq, V : Add + Copy>?

Open dzmitry-lahoda opened this issue 9 months ago • 4 comments

What would be equivalent of this?

    fn merge_sum<K: Hash + Eq, V : Add + Copy>(
        map1: HashMap<K, V>,
        map2: HashMap<K, V>,
    ) -> HashMap<K, V> {
        let mut merged = map1;
        for (key, value) in map2 {
            *merged.entry(key).or_insert(0) += value;
        }
        merged
    }

Like sum for arrays, but with map specifics.

dzmitry-lahoda avatar Feb 28 '25 20:02 dzmitry-lahoda

I'm not sure there's an iterator based solution that'll be better than what you've written for merging two maps (some reasons below) but here's a functional equivalent:

fn merge_sum<K: Hash + Eq, V: Add<Output = V>>(
    map1: HashMap<K, V>,
    map2: HashMap<K, V>,
) -> HashMap<K, V> {
    [map1, map2]
        .into_iter()
        .flatten()
        .into_group_map()
        .into_iter()
        .map(|(k, vs)| {
            let mut vs = vs.into_iter();
            let mut v = vs.next().expect("non-empty since yielded by into_group_map");
            if let Some(v2) = vs.next() {
                v = v + v2;
            }
            // vs now empty
            (k, v)
        })
        .collect()
}

Even ignoring that HashMaps aren't the best collections to iterate over, once you use map.into_iter(), you lose the invariant that each k can only occur once in the tuples (k, v). Further, the same keys may not be iterated over in the same order by the two maps, so you have to use something like Itertools::into_group_map, which creates a bunch of vectors. We know these will be non-empty because of how into_grouping_map works (and length at most 2 because we're chaining together two maps) but you can't statically guarantee this, hence the need for something like expect.

On the other hand this would generalize reasonably well if you start with an iterator of HashMaps:

fn merge_sum_many<K: Hash + Eq, V: Sum>(
    iter: impl IntoIterator<Item = HashMap<K, V>>
) -> HashMap<K, V> {
    iter
        .into_iter()
        .flatten()
        .into_group_map()
        .into_iter()
        .map(|(k, vs)| {
            let v = vs.into_iter().sum();
            (k, v)
        })
        .collect()
}

ronnodas avatar Mar 09 '25 13:03 ronnodas

For iterators whose elements are sorted (such as the ones from a BTreeMap, merge_join_by may be handy.

phimuemue avatar Mar 09 '25 13:03 phimuemue

Here is what @phimuemue suggests:

fn merge_sum_ordered<K: Ord, V: Add<Output = V>>(
    map1: BTreeMap<K, V>,
    map2: BTreeMap<K, V>,
) -> BTreeMap<K, V> {
    use itertools::EitherOrBoth::{Both, Left, Right};

    map1.into_iter()
        .merge_join_by(map2, |(k1, _), (k2, _)| k1.cmp(k2))
        .map(|ts| match ts {
            Left((k, v)) | Right((k, v)) => (k, v),
            Both((k, v1), (_, v2)) => (k, v1 + v2),
        })
        .collect()
}

ronnodas avatar Mar 09 '25 13:03 ronnodas

I think the core answer here is that there's nothing hashmap-like as part of the iterator interface.

So it's entirely normal that there's no iterator helpers that really matter for that code pattern.

(Compare BTreeMap, where the sorted iterator you get from it is what makes all the difference in an itertools helper being applicable.)

scottmcm avatar Mar 09 '25 23:03 scottmcm