segvec Performance Improvements

Right now, the implementation is significantly slower than Vec for most things. This can be fixed.

TODO:

[x] insert, remove, and drain can copy mem to avoid some iterative shifting.
[ ] sort implementation can be optimized, and should probably be more than a naive quicksort
[x] Use SegVec::with_capacity in FromIterator implementation

Jun 30 '21 04:06 mccolljr

@mccolljr I wonder if it's possible to store Vec<[Option<T>; 4]>, rather than Vec<Vec<T>>. this should avoid the double indirection at the potential expense of memory. though, i think it might not be that much worse in terms of memory usage than storing vectors, since small vectors go from 0 -> 4

also might make sense to write a wrapper struct that enforces the invariant of contiguous elements (i.e. never having an array that is Some(..), None, Some(..))

struct Segment<T>([Option<T>; 4]);

struct SegVec<T> {
    cap: usize,
    size: usize,
    segments: Vec<Segment<T>>,
}

Jun 30 '21 17:06 connorskees

@connorskees That's a good idea to reduce the overhead of each segment, but I think it wouldn't work here since one of the properties I want to maintain is that each doubling of vector size only requires a single allocation, which means that each new segment is necessarily progressively larger than the last. However, I've been playing with this over the last day and I've realized that:

it is possible to allow the user to specify the base segment size (which can reduce the overhead of the storage by requiring fewer chunks), and
because the len and capacity of each vector is deterministic for any fixed number of elements and a fixed base segment size, so I can get away with storing just the head pointer for each segment which reduces the size of the reference to each segment from 3 * size_of<usize>() to just size_of<usize>(), and
I can use the smallvec optimization instead of a plain Vec<Segment<T>> to optimize the size of the representation of segments

Doing this could allow an arbitrarily large number of items to be stored in the SegVec with only one level of indirection required for data access, provided a sufficient N (# of segment headers to keep on the stack) and base segment size are chosen

Jul 03 '21 00:07 mccolljr

Status on this:

I modified the remove, insert, and in part drain implementations to use std::mem::copy where applicable, which should hopefully offer some improvement to the timing there. I need to revisit that code and do a cleanup pass.

Jul 11 '21 16:07 mccolljr

Notice from the performance department:

I expressed before that I am not very fond of using 'thin-vec', now while playing around with the benchmark and different features (trying to see if cacheline alignment would speed things up) I noticed that thin-vec makes things much slower (30-60% decrease in performance). I'd expected it to be at least as fast as without :/

Jun 12 '23 19:06 cehteh

Notice from the performance department:

I expressed before that I am not very fond of using 'thin-vec', now while playing around with the benchmark and different features (trying to see if cacheline alignment would speed things up) I noticed that thin-vec makes things much slower (30-60% decrease in performance). I'd expected it to be at least as fast as without :/

This is good to know. This was an early attempt to keep the segment header size as small as possible, but the extra indirection needed to access segment length and capacity probably causes cache misses for a very marginal gain in space segment header size.

When I have some time this week I intend to abstract the storage into a trait so that a consumer of the data structure can choose their own storage implementation, and I will provide a default implementation for Vec, which will be the default. This should allow me to eliminate both cargo features, and then if someone wants thinvec support, they can implement the trait for a wrapper type.

Jun 12 '23 19:06 mccolljr

i am playing with what i noted before:

pub mod detail {
    use super::MemConfig;
    #[cfg(feature = "raw-alloc")]
    pub struct Segment<T, C: MemConfig>(*mut T);
    #[cfg(feature = "thin-segments")]
    pub struct Segment<T, C: MemConfig>(thin_vec::ThinVec<T>);
    #[cfg(not(feature = "thin-segments"))]
    pub struct Segment<T, C: MemConfig>(Vec<T>);

    ....
}

This way the Segments becomes thin pointers (later ideally in a SmallVec) and Segment is a raw unsized array without header/metadata. The size comes from MemConfig.

Jun 12 '23 19:06 cehteh

i am playing with what i noted before:

pub mod detail {
    use super::MemConfig;
    #[cfg(feature = "raw-alloc")]
    pub struct Segment<T, C: MemConfig>(*mut T);
    #[cfg(feature = "thin-segments")]
    pub struct Segment<T, C: MemConfig>(thin_vec::ThinVec<T>);
    #[cfg(not(feature = "thin-segments"))]
    pub struct Segment<T, C: MemConfig>(Vec<T>);

    ....
}

When you make this a Trait, please pass C: MemConfig as parameter to the trait! .. I think i wait for you to do this and then continue on the raw-alloc.

Jun 12 '23 19:06 cehteh

'Segments<T, C: MemConfig>' should become a newtype w/ trait too.

Rationale: MemConfig determines the segment_size() arithmetically, but Vec / slice based storage backends could query that without calculation, only my planned thin '*mut T' would need arithmetic here.

Jun 13 '23 11:06 cehteh

another note mostly for myself, but opinions would be welcome:

while the 'SegmentCache' try was working it turned out to be way to intrusive to be useful and 'Linear' outperformed it in many cases.

I am thinking about reviving the caching idea only for Slices:

trait SegmentCache {
   //...
}

struct NoCache; // ZST, the default does nothing

struct Cache {
    segment_start: usize,
    segment_end: usize,
    segment: usize,
}

impl SegmentCache for NoCache { /*NOP proxies*/}
impl SegmentCache for Cache { /*.. do the caching here..*/}

pub struct Slice<'a, T: 'a, C SegmentCache = NoCache> {
    inner: &'a (dyn SegmentIndex<T>),
    start: usize,
    len: usize,
    cache: C,
}

So far thats only in brainfart stage.

So Caching will be opt-in with some (3 usize) overhead. Default will stay as is.

Implemented only for Slice, not SliceMut since caching only works well for accesses within the same segment. One shouldo clone Slices when working on different parts of the Slice, SliceMut cant be cloned (eventually some SliceMut::split_at_mut() may become handy, then caching may be ok)

I have no immediate need for this, I just write it down here, would be a low hanging fruit to be implemented later and improve indexing operations for Slices into Proportional and Exponential configured SegVecs.

Jun 20 '23 10:06 cehteh