rfcs
rfcs copied to clipboard
Proposal: Add FromIterator impl for HashMap<K, Vec<V>>
Summary
I'm proposing a small quality of life improvement, that I was surprised to find wasn't already a part of the language.
I want to introduce a new FromIterator<(K, V)> implementation for HashMap<K, Vec<V>>.
I'm happy to create a PR for this if we want to move forward.
Note: I tried asking this on internals, without any responses, so am hoping to get more traction here.
Motivation
It's relatively common to want to group a list of values over some specific key. The current method of doing this either requires a third party crate or manually constructing the HashMap in a for loop, making use of the Entry API.
This is common enough that I was surprised not to see an ergonomic approach using collect. This RFC proposes adding an impl to make this possible.
Example
As an example, imagine you had a list of books, and wanted to group the books by the author that wrote them.
Current Implementation
let mut books_by_author: HashMap<&str, Vec<&Book>> = HashMap::new();
for book in &books {
books_by_author.entry(book.author).or_default().push(book);
}
With This Proposal
let books_by_author: HashMap<&str, Vec<&Book>> =
books.iter().map(|book| (book.author, book)).collect();
Future Considerations
- Instead of adding this implementation for just
Vec<V>, we could consider adding it for all collectionsC: Default + Extend<V>, extending this functionality to all collections, such as aHashSet.
Prior Art
imho adding this for HashMap and BTreeMap with the generic value type implementing Default and Extend is what we should have...
It would be important to consider any degredation in type inference as a result of this implementation.
Previously, let x: HashMap<_, _> = it.collect(); would be unambiguous, but with this implementation it would result in a compiler error.
This kind of breakage is allowed by the backwards compatibility rules in principle, but that is only if there's minimal real-world impact.
For BTreeMap, I am doubtful that BTreeMap<K, Vec<V>> is even the correct solution at all. Duplicate entries in a HashMap would suffer severely from hash collision, but duplicate entries in a BTreeMap are totally fine.
@SOF3 that doesn't make sense, both BTreeMap and HashMap require unique keys, it has nothing to do with hash collisions,
@Diggsey The ambiguity of HashMap<_, _> feels like a non-starter for this issue, as I'm guessing that will lead to significant breakage. Are there any workarounds?
You could create a new function instead of using collect. Or you could create a new MultiMap<K, V> type which is distinct from HashMap<_, Vec<_>>.
@Diggsey I'm referring to the data structure itself. BTreeMap<K, Vec<K>> is an inefficient representation because you could just place multiple values in separate nodes directly without additional allocation.