multimap Use `SmallVec` instead of `Vec`

If we use SmallVec::<[T;1]> instead of Vec for each value list, then it will not allocate for case of single value. Thus it may optimize for most common case

Apr 10 '22 01:04 damooo

I like it!

There are a couple of things which makes this not that straight forward:

it adds a dependency (and that's sort of a slippery slope)
it will be sort of a breaking change:

    assert_eq!(map.get_vec("key1"), Some(&vec![42, 1337]));

Would have to be implemented something like:

    pub fn get_vec<Q: ?Sized>(&self, k: &Q) -> Option<&Vec<V>>
    where
        K: Borrow<Q>,
        Q: Eq + Hash,
    {
        // this will potentially allocate
        self.inner.get(k).map(to_vec)
    }

Apr 11 '22 06:04 havarnov

For dependency, smallvec is maintained by servo team, and is reliable
That's an issue. If breakable, can be returned as &[V] though.

Apr 11 '22 17:04 damooo

This actually does sound really cool. I'm just evaluating multimap for the first time. For my initial use case, most keys will have a single value, and I worry about the overhead and performance of using a Vec with dynamic allocations for every item. This sounds like a great solution. I wasn't even aware of SmallVec before now.

I also think that returning a slice instead of a Vec is a better API as it abstracts away the underlying implementation detalis.

I definitely think this is worth a breaking change.

Apr 17 '22 23:04 fpagliughi

I just experimented with this and it looks pretty good, but there are at least one thing I don't like about returning a slice and not a "real" Vec<_>.

It's not possible to do something like:

let mut m: MultiMap<usize, usize> = MultiMap::new();
m.insert(1, 42);
m.insert(1, 43);
if let Some(v) = m.get_vec_mut() {
    v.push(44);
}

https://github.com/havarnov/multimap/tree/experimental/smallvec

Apr 19 '22 21:04 havarnov

That's a little unfortunate, but I do think it's the result of leaking the internal implementation details in the public API.

I wonder if the existing API can be mostly retained by abstracting the vector type with a type definition in the multimap crate, and returning that instead of a slice - at least for the _mut variations.

Or, maybe wrap the implementation vector type in something that acts like a common subset of vectors (push. pop, len, etc)?

Apr 19 '22 23:04 fpagliughi

I tried playing with it here: https://github.com/fpagliughi/multimap/tree/experimental/smallvec-typed

It creates the trivial abstractions:

/// The internal vector type for each item.
///
/// This can be any type that resembles a standard Vec.
pub type MapVec<V> = SmallVec<[V; 1]>;

/// A macro that can create and populate a new MapVec collection.
/// It works like the vec![] macro.
pub use smallvec::smallvec as mapvec;

#[derive(Clone)]
pub struct MultiMap<K, V, S = RandomState> {
    inner: HashMap<K, MapVec<V>, S>,
}

Then I returned the abstracted type, like:

    pub fn get_vec_mut<Q: ?Sized>(&mut self, k: &Q) -> Option<&mut MapVec<V>>
    where
        K: Borrow<Q>,
        Q: Eq + Hash,
    {
        self.inner.get_mut(k)
    }

I'm not super happy with this implementation, though. It still exposes the internal type through a thin renaming, and I'm not sure how to make an easy comparison to a slice this way:

assert_eq!(Some(&[2, 3, 4][..]), m.get_vec(&1).as_deref());   // Doesn't work

I suppose this could all be fixed with a bunch of work to fully wrap the internal vector type. But I still think that exposing the internal implementation of how multiple values are held for each key is not necessarily appropriate for the public API. I don't think it's necessary to require being able to add and remove items from a key through a reference to an internal type in the collection.

But it does mean that the API would need to be expanded a little. For example, I don't see a way to remove a single item from a key:

let mut map = MultiMap::new();
map.insert(1, 42);
map.insert(1, 1337);
map.remove_item(1, 1337);
assert_eq!(&[42], map.get_vec(&1));

Apr 20 '22 14:04 fpagliughi

how about just exposing a trait, something like:

trait MultiMapValue {
    type Item;
    fn as_slice(&mut self) -> &mut [Self::Item];
    fn insert(&mut self, value: Self::Item);
}

impl<V> MultiMapValue for &mut smallvec::SmallVec<[V; 8]> {

    type Item = V;

    fn as_slice(&mut self) -> &mut [Self::Item] {
        self.as_mut_slice()
    }

    fn insert(&mut self, value: Self::Item) {
        self.push(value)
    }
}

and then have methods like this:

    pub fn get_all_mut<Q: ?Sized>(&mut self, k: &Q) -> Option<impl MultiMapValue<Item = V> + '_>
        where
            K: Borrow<Q>,
            Q: Eq + Hash,
    {
        self.inner.get_mut(k)
    }

which in turn can be used like this:

    #[test]
    fn get_all_mut() {
        let mut m: MultiMap<usize, usize> = MultiMap::new();
        m.insert(1, 42);
        m.insert(1, 1337);
        if let Some(mut v) = m.get_all_mut(&1) {
            v.insert(555);
            (*v.as_slice())[0] = 5;
            (*v.as_slice())[1] = 10;
            (*v.as_slice())[2] = 55;
        }
        assert_eq!(
            Some(&vec![5, 10, 55][..]),
            m.get_vec(&1));
    }

Apr 20 '22 19:04 havarnov

@havarnov , That seems great idea.

Or we can better implement a new_type that wraps a smallvec/vec/..., and exposes standard std::iter::Extend trait for allowing mutation, and AsRef<[T]> for accessing as slice, etc. This will give us freedom to implement what ever trait we want on new type.

Apr 22 '22 14:04 damooo

I needed this functionality for a small project I'm working on. I took the experimental/smallvec branch and updated it to make all tests pass. PR available here: #41 Note that this does break the existing API.

Dec 02 '22 14:12 abspoel

Instead of SmellVec<[T; 1]>, an enum with variants One(T) and Many(Vec<T>) can be used:

That avoids the dependency, allows keeping get_vec_mut() (but not get_vec() or iter_all()), and is smaller for some types:

`T`	size of `SmallVec<[T; 1]>`	size of `OneOrMany<T>`
`usize`	32	24
`&[u8]`	32	32
`Vec<u8>`	40	32
`VecDeque<u8>`	48	40
`[&u8; 4]`	48	32

(I had hoped it would be smaller for slice and VecDeque)

The downside is that it cannot be used for N > 1 or parameterized like #41 does.

I've implemented this in #43.

Feb 07 '23 22:02 tormol

multimap multimap copied to clipboard

Use `SmallVec` instead of `Vec`

multimap
multimap copied to clipboard