multimap
multimap copied to clipboard
Use `SmallVec` instead of `Vec`
If we use SmallVec::<[T;1]>
instead of Vec
for each value list, then it will not allocate for case of single value. Thus it may optimize for most common case
I like it!
There are a couple of things which makes this not that straight forward:
- it adds a dependency (and that's sort of a slippery slope)
- it will be sort of a breaking change:
assert_eq!(map.get_vec("key1"), Some(&vec![42, 1337]));
Would have to be implemented something like:
pub fn get_vec<Q: ?Sized>(&self, k: &Q) -> Option<&Vec<V>>
where
K: Borrow<Q>,
Q: Eq + Hash,
{
// this will potentially allocate
self.inner.get(k).map(to_vec)
}
- For dependency,
smallvec
is maintained by servo team, and is reliable - That's an issue. If breakable, can be returned as
&[V]
though.
This actually does sound really cool. I'm just evaluating multimap
for the first time. For my initial use case, most keys will have a single value, and I worry about the overhead and performance of using a Vec with dynamic allocations for every item. This sounds like a great solution. I wasn't even aware of SmallVec
before now.
I also think that returning a slice instead of a Vec is a better API as it abstracts away the underlying implementation detalis.
I definitely think this is worth a breaking change.
I just experimented with this and it looks pretty good, but there are at least one thing I don't like about returning a slice and not a "real" Vec<_>
.
It's not possible to do something like:
let mut m: MultiMap<usize, usize> = MultiMap::new();
m.insert(1, 42);
m.insert(1, 43);
if let Some(v) = m.get_vec_mut() {
v.push(44);
}
https://github.com/havarnov/multimap/tree/experimental/smallvec
That's a little unfortunate, but I do think it's the result of leaking the internal implementation details in the public API.
I wonder if the existing API can be mostly retained by abstracting the vector type with a type
definition in the multimap crate, and returning that instead of a slice - at least for the _mut
variations.
Or, maybe wrap the implementation vector type in something that acts like a common subset of vectors (push. pop, len, etc)?
I tried playing with it here: https://github.com/fpagliughi/multimap/tree/experimental/smallvec-typed
It creates the trivial abstractions:
/// The internal vector type for each item.
///
/// This can be any type that resembles a standard Vec.
pub type MapVec<V> = SmallVec<[V; 1]>;
/// A macro that can create and populate a new MapVec collection.
/// It works like the vec![] macro.
pub use smallvec::smallvec as mapvec;
#[derive(Clone)]
pub struct MultiMap<K, V, S = RandomState> {
inner: HashMap<K, MapVec<V>, S>,
}
Then I returned the abstracted type, like:
pub fn get_vec_mut<Q: ?Sized>(&mut self, k: &Q) -> Option<&mut MapVec<V>>
where
K: Borrow<Q>,
Q: Eq + Hash,
{
self.inner.get_mut(k)
}
I'm not super happy with this implementation, though. It still exposes the internal type through a thin renaming, and I'm not sure how to make an easy comparison to a slice this way:
assert_eq!(Some(&[2, 3, 4][..]), m.get_vec(&1).as_deref()); // Doesn't work
I suppose this could all be fixed with a bunch of work to fully wrap the internal vector type. But I still think that exposing the internal implementation of how multiple values are held for each key is not necessarily appropriate for the public API. I don't think it's necessary to require being able to add and remove items from a key through a reference to an internal type in the collection.
But it does mean that the API would need to be expanded a little. For example, I don't see a way to remove a single item from a key:
let mut map = MultiMap::new();
map.insert(1, 42);
map.insert(1, 1337);
map.remove_item(1, 1337);
assert_eq!(&[42], map.get_vec(&1));
how about just exposing a trait, something like:
trait MultiMapValue {
type Item;
fn as_slice(&mut self) -> &mut [Self::Item];
fn insert(&mut self, value: Self::Item);
}
impl<V> MultiMapValue for &mut smallvec::SmallVec<[V; 8]> {
type Item = V;
fn as_slice(&mut self) -> &mut [Self::Item] {
self.as_mut_slice()
}
fn insert(&mut self, value: Self::Item) {
self.push(value)
}
}
and then have methods like this:
pub fn get_all_mut<Q: ?Sized>(&mut self, k: &Q) -> Option<impl MultiMapValue<Item = V> + '_>
where
K: Borrow<Q>,
Q: Eq + Hash,
{
self.inner.get_mut(k)
}
which in turn can be used like this:
#[test]
fn get_all_mut() {
let mut m: MultiMap<usize, usize> = MultiMap::new();
m.insert(1, 42);
m.insert(1, 1337);
if let Some(mut v) = m.get_all_mut(&1) {
v.insert(555);
(*v.as_slice())[0] = 5;
(*v.as_slice())[1] = 10;
(*v.as_slice())[2] = 55;
}
assert_eq!(
Some(&vec![5, 10, 55][..]),
m.get_vec(&1));
}
@havarnov , That seems great idea.
Or we can better implement a new_type that wraps a smallvec/vec/..., and exposes standard std::iter::Extend
trait for allowing mutation, and AsRef<[T]>
for accessing as slice, etc. This will give us freedom to implement what ever trait we want on new type.
I needed this functionality for a small project I'm working on. I took the experimental/smallvec branch and updated it to make all tests pass. PR available here: #41 Note that this does break the existing API.
Instead of SmellVec<[T; 1]>
, an enum with variants One(T)
and Many(Vec<T>)
can be used:
That avoids the dependency, allows keeping get_vec_mut()
(but not get_vec()
or iter_all()
), and is smaller for some types:
T |
size of SmallVec<[T; 1]> |
size of OneOrMany<T> |
---|---|---|
usize |
32 | 24 |
&[u8] |
32 | 32 |
Vec<u8> |
40 | 32 |
VecDeque<u8> |
48 | 40 |
[&u8; 4] |
48 | 32 |
(I had hoped it would be smaller for slice and VecDeque
)
The downside is that it cannot be used for N > 1 or parameterized like #41 does.
I've implemented this in #43.