multimap icon indicating copy to clipboard operation
multimap copied to clipboard

Use `SmallVec` instead of `Vec`

Open damooo opened this issue 2 years ago • 8 comments

If we use SmallVec::<[T;1]> instead of Vec for each value list, then it will not allocate for case of single value. Thus it may optimize for most common case

damooo avatar Apr 10 '22 01:04 damooo

I like it!

There are a couple of things which makes this not that straight forward:

  • it adds a dependency (and that's sort of a slippery slope)
  • it will be sort of a breaking change:
    assert_eq!(map.get_vec("key1"), Some(&vec![42, 1337]));

Would have to be implemented something like:

    pub fn get_vec<Q: ?Sized>(&self, k: &Q) -> Option<&Vec<V>>
    where
        K: Borrow<Q>,
        Q: Eq + Hash,
    {
        // this will potentially allocate
        self.inner.get(k).map(to_vec)
    }

havarnov avatar Apr 11 '22 06:04 havarnov

  • For dependency, smallvec is maintained by servo team, and is reliable
  • That's an issue. If breakable, can be returned as &[V] though.

damooo avatar Apr 11 '22 17:04 damooo

This actually does sound really cool. I'm just evaluating multimap for the first time. For my initial use case, most keys will have a single value, and I worry about the overhead and performance of using a Vec with dynamic allocations for every item. This sounds like a great solution. I wasn't even aware of SmallVec before now.

I also think that returning a slice instead of a Vec is a better API as it abstracts away the underlying implementation detalis.

I definitely think this is worth a breaking change.

fpagliughi avatar Apr 17 '22 23:04 fpagliughi

I just experimented with this and it looks pretty good, but there are at least one thing I don't like about returning a slice and not a "real" Vec<_>.

It's not possible to do something like:

let mut m: MultiMap<usize, usize> = MultiMap::new();
m.insert(1, 42);
m.insert(1, 43);
if let Some(v) = m.get_vec_mut() {
    v.push(44);
}

https://github.com/havarnov/multimap/tree/experimental/smallvec

havarnov avatar Apr 19 '22 21:04 havarnov

That's a little unfortunate, but I do think it's the result of leaking the internal implementation details in the public API.

I wonder if the existing API can be mostly retained by abstracting the vector type with a type definition in the multimap crate, and returning that instead of a slice - at least for the _mut variations.

Or, maybe wrap the implementation vector type in something that acts like a common subset of vectors (push. pop, len, etc)?

fpagliughi avatar Apr 19 '22 23:04 fpagliughi

I tried playing with it here: https://github.com/fpagliughi/multimap/tree/experimental/smallvec-typed

It creates the trivial abstractions:

/// The internal vector type for each item.
///
/// This can be any type that resembles a standard Vec.
pub type MapVec<V> = SmallVec<[V; 1]>;

/// A macro that can create and populate a new MapVec collection.
/// It works like the vec![] macro.
pub use smallvec::smallvec as mapvec;

#[derive(Clone)]
pub struct MultiMap<K, V, S = RandomState> {
    inner: HashMap<K, MapVec<V>, S>,
}

Then I returned the abstracted type, like:

    pub fn get_vec_mut<Q: ?Sized>(&mut self, k: &Q) -> Option<&mut MapVec<V>>
    where
        K: Borrow<Q>,
        Q: Eq + Hash,
    {
        self.inner.get_mut(k)
    }

I'm not super happy with this implementation, though. It still exposes the internal type through a thin renaming, and I'm not sure how to make an easy comparison to a slice this way:

assert_eq!(Some(&[2, 3, 4][..]), m.get_vec(&1).as_deref());   // Doesn't work

I suppose this could all be fixed with a bunch of work to fully wrap the internal vector type. But I still think that exposing the internal implementation of how multiple values are held for each key is not necessarily appropriate for the public API. I don't think it's necessary to require being able to add and remove items from a key through a reference to an internal type in the collection.

But it does mean that the API would need to be expanded a little. For example, I don't see a way to remove a single item from a key:

let mut map = MultiMap::new();
map.insert(1, 42);
map.insert(1, 1337);
map.remove_item(1, 1337);
assert_eq!(&[42], map.get_vec(&1));

fpagliughi avatar Apr 20 '22 14:04 fpagliughi

how about just exposing a trait, something like:

trait MultiMapValue {
    type Item;
    fn as_slice(&mut self) -> &mut [Self::Item];
    fn insert(&mut self, value: Self::Item);
}

impl<V> MultiMapValue for &mut smallvec::SmallVec<[V; 8]> {

    type Item = V;

    fn as_slice(&mut self) -> &mut [Self::Item] {
        self.as_mut_slice()
    }

    fn insert(&mut self, value: Self::Item) {
        self.push(value)
    }
}

and then have methods like this:

    pub fn get_all_mut<Q: ?Sized>(&mut self, k: &Q) -> Option<impl MultiMapValue<Item = V> + '_>
        where
            K: Borrow<Q>,
            Q: Eq + Hash,
    {
        self.inner.get_mut(k)
    }

which in turn can be used like this:

    #[test]
    fn get_all_mut() {
        let mut m: MultiMap<usize, usize> = MultiMap::new();
        m.insert(1, 42);
        m.insert(1, 1337);
        if let Some(mut v) = m.get_all_mut(&1) {
            v.insert(555);
            (*v.as_slice())[0] = 5;
            (*v.as_slice())[1] = 10;
            (*v.as_slice())[2] = 55;
        }
        assert_eq!(
            Some(&vec![5, 10, 55][..]),
            m.get_vec(&1));
    }

havarnov avatar Apr 20 '22 19:04 havarnov

@havarnov , That seems great idea.

Or we can better implement a new_type that wraps a smallvec/vec/..., and exposes standard std::iter::Extend trait for allowing mutation, and AsRef<[T]> for accessing as slice, etc. This will give us freedom to implement what ever trait we want on new type.

damooo avatar Apr 22 '22 14:04 damooo

I needed this functionality for a small project I'm working on. I took the experimental/smallvec branch and updated it to make all tests pass. PR available here: #41 Note that this does break the existing API.

abspoel avatar Dec 02 '22 14:12 abspoel

Instead of SmellVec<[T; 1]>, an enum with variants One(T) and Many(Vec<T>) can be used:

That avoids the dependency, allows keeping get_vec_mut() (but not get_vec() or iter_all()), and is smaller for some types:

T size of SmallVec<[T; 1]> size of OneOrMany<T>
usize 32 24
&[u8] 32 32
Vec<u8> 40 32
VecDeque<u8> 48 40
[&u8; 4] 48 32

(I had hoped it would be smaller for slice and VecDeque)

The downside is that it cannot be used for N > 1 or parameterized like #41 does.

I've implemented this in #43.

tormol avatar Feb 07 '23 22:02 tormol