rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

Proposal: Add FromIterator impl for HashMap<K, Vec<V>>

Open vultix opened this issue 3 years ago • 7 comments

Summary

I'm proposing a small quality of life improvement, that I was surprised to find wasn't already a part of the language. I want to introduce a new FromIterator<(K, V)> implementation for HashMap<K, Vec<V>>.

I'm happy to create a PR for this if we want to move forward.

Note: I tried asking this on internals, without any responses, so am hoping to get more traction here.

Motivation

It's relatively common to want to group a list of values over some specific key. The current method of doing this either requires a third party crate or manually constructing the HashMap in a for loop, making use of the Entry API.

This is common enough that I was surprised not to see an ergonomic approach using collect. This RFC proposes adding an impl to make this possible.

Example

As an example, imagine you had a list of books, and wanted to group the books by the author that wrote them.

Current Implementation

let mut books_by_author: HashMap<&str, Vec<&Book>> = HashMap::new();  
  
for book in &books {  
    books_by_author.entry(book.author).or_default().push(book);  
}

With This Proposal

let books_by_author: HashMap<&str, Vec<&Book>> =
    books.iter().map(|book| (book.author, book)).collect();

Future Considerations

  • Instead of adding this implementation for just Vec<V>, we could consider adding it for all collections C: Default + Extend<V>, extending this functionality to all collections, such as a HashSet.

Prior Art

vultix avatar Dec 02 '22 19:12 vultix

imho adding this for HashMap and BTreeMap with the generic value type implementing Default and Extend is what we should have...

programmerjake avatar Dec 02 '22 22:12 programmerjake

It would be important to consider any degredation in type inference as a result of this implementation.

Previously, let x: HashMap<_, _> = it.collect(); would be unambiguous, but with this implementation it would result in a compiler error.

This kind of breakage is allowed by the backwards compatibility rules in principle, but that is only if there's minimal real-world impact.

Diggsey avatar Dec 03 '22 02:12 Diggsey

For BTreeMap, I am doubtful that BTreeMap<K, Vec<V>> is even the correct solution at all. Duplicate entries in a HashMap would suffer severely from hash collision, but duplicate entries in a BTreeMap are totally fine.

SOF3 avatar Dec 04 '22 12:12 SOF3

@SOF3 that doesn't make sense, both BTreeMap and HashMap require unique keys, it has nothing to do with hash collisions,

Diggsey avatar Dec 04 '22 16:12 Diggsey

@Diggsey The ambiguity of HashMap<_, _> feels like a non-starter for this issue, as I'm guessing that will lead to significant breakage. Are there any workarounds?

vultix avatar Dec 04 '22 17:12 vultix

You could create a new function instead of using collect. Or you could create a new MultiMap<K, V> type which is distinct from HashMap<_, Vec<_>>.

Diggsey avatar Dec 04 '22 17:12 Diggsey

@Diggsey I'm referring to the data structure itself. BTreeMap<K, Vec<K>> is an inefficient representation because you could just place multiple values in separate nodes directly without additional allocation.

SOF3 avatar Dec 05 '22 04:12 SOF3