libs-team icon indicating copy to clipboard operation
libs-team copied to clipboard

Add a "shrink-if-excessive" API to `Vec`

Open scottmcm opened this issue 9 months ago • 5 comments

Proposal

Problem statement

Stealing from steffahn in https://internals.rust-lang.org/t/vec-is-asymmetric-with-memory-handling/22449/31?u=scottmcm:

I think Vec is at least missing some sort of generic shrink()/garbage_collect()/... function with good asymptotics, so that if you called it after every potentially-len-shrinking operation it wouldn’t make your program accidentally-quadratic the way that overuse of shrink_to_fit can.

If it isn't the default, it should at least be easy to manually use Vec as a dynamic array with (only) a constant factor memory overhead, if you want to.

Motivating examples or use cases

After a .collect() or .retain(…), where you know you're not going to be pushing for a while so you'd like to not be wasting too much capacity, but aren't worried about it being exactly anything (since reallocating to save one or two items is probably a waste of time, and doesn't actually reduce memory usage).

Having something to write after the warning in https://doc.rust-lang.org/std/vec/struct.Vec.html#impl-FromIterator%3CT%3E-for-Vec%3CT%3E, as something like "You can call the ______ method after doing the operation if you want to avoid excessive remaining capacity".

Solution sketch

// in alloc::vec

impl<T, A: Allocator> Vec<T, A> {
    /// If this vector's `.capacity()` is much larger than its `.len()`,
    /// then `.shrink_to_fit()` it, otherwise do nothing.
    /// 
    /// For example, you might want to call this after doing a `retain`
    /// on a particularly-large vector that has a chance of removing
    /// most of the elements, but where you wouldn't want to 
    /// `.shrink_to_fit()` unconditionally since that's a waste of
    /// CPU if only a few items were removed.
    /// 
    /// Note that "much larger" is intentionally unspecified here,
    /// just like how `reserve` does not specify its growth strategy.
    /// **Do not** use it in any situation where you need a particular
    /// capacity exactly.
    /// 
    /// Beware calling this in a loop, as if that loop sometimes adds
    /// things to the container, this can cause quadratic complexity
    /// for things that would otherwise have been linear.
    /// 
    /// For now, only a few basic things are guaranteed:
    /// - This is *idempotent*: calling it multiple times (without otherwise
    ///   changing the vector between the calls) does the same as
    ///   calling it just once would do.
    /// - If *all* you do on an empty, zero-capacity vector is a sequence
    ///   of `push`es, then there's no need to call this as it'd do nothing.
    /// - In a loop that *monotonically decreases* the length of a vector,
    ///   calling this every iteration will not itself cause quadratic behaviour.
    fn shrink(&mut self) {
        if self.capacity() ⋙ self.len() {
            self.shrink_to_fit();
        }
    }
}

Alternatives

  • shrink might be too short a name for something that can still be very costly if misused, so shrink_to_reasonable or shrink_if_profitable or something might be better. But at the same time since it's less risky than a full shrink_to_fit, giving it a more convenient name might be better.
  • We could give more or fewer guarantees.

Links and related work

https://github.com/rust-lang/rust/issues/120091

https://internals.rust-lang.org/t/vec-is-asymmetric-with-memory-handling/22449/31?u=scottmcm

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.

scottmcm avatar Mar 03 '25 00:03 scottmcm

Just concerning implementation: Should it not instead shrink to the smallest possible power of two capacity? I think "shrink to len.next_power_of_two()" is a much easier and less arbitrary algorithm. It also makes sense as an inverse of the growth factor.

pitaj avatar Mar 03 '25 02:03 pitaj

Should it not instead shrink to the smallest possible power of two capacity?

I don't think so. If I happen to have a capacity-257 vector, I don't want it to shrink to 256. That's a waste of cycles.

That said, I'd certainly like to -- like in reserve -- want to leave things open enough that it can be delegated to the allocator to make smarter decisions about whether shrinking is worth it.

It also makes sense as an inverse of the growth factor.

The initial implementation I'd propose here is something along the lines of

if (v.cap() - v.len()) > cmp::max(v.len(), RawVec<T>::MIN_NON_ZERO_CAP)

which, yes, is an inverse of the growth factor.

(The growth factor is doubling, not rounding up to a power of two.)

scottmcm avatar Mar 03 '25 02:03 scottmcm

I think users interested in using this function would want to control the strategy or load factor rather than trust the implementation choice.

Something like this with a float, integer ratio, or strategy enum argument type.

fn shrink_with_factor(&mut self, load_factor: f32) -> bool;

Users may want to know if the resize operation was executed so the function could return bool or Option<NonZeroUsize> of the number of elements v_old.cap() - v_new.cap() the vec was shrunk by.

okaneco avatar Mar 03 '25 03:03 okaneco

I agree with the problem statement and motivation. But it would be unfortunate to introduce a new easy way to be accidentally quadratic. The “beware of calling this in a loop that adds items” advice rules out some valid use cases, and is not essential for bounding excess capacity.

I think it’s desirable and easily possible to keep the same amortized time complexity for basically any usage pattern — even if you call shrink() after every other Vec operation it may be significantly slower but “only” by a reasonable constant factor. Even v.reserve_exact(2*v.capacity()); v.shrink(); can be “merely” a waste of Θ(cap) time, asymptotically the same as the cost of reserving the excess capacity in the first place.

Basically, just choose the threshold (function) for when to shrink the capacity so that there’s always Θ(cap) distance between the length right after “normal” (not requested via reserve) growth and the length where it’ll shrink again. For a fixed growth factor X, that means don’t shrink until length < epsilon * (1/X) * capacity for some positive constant epsilon < 1. (The initial implementation suggested by @scottmcm would be epsilon = 1, which has worse asymptotics.) This is the textbook solution, usually analyzed for push/pop operations only, but I think it works more generally.

hanna-kruppe avatar Mar 03 '25 12:03 hanna-kruppe

I think users interested in using this function would want to control the strategy or load factor rather than trust the implementation choice.

I don't think so, because if you want to control it exactly then you already have all the functionality you need, between reserve_exact and shrink_to.

To me, the point of this is that it's the obvious "just do this after a collect or retain that you're not going to push onto" thing, without needing to make choices for things like "well should I pass 1½ or φ or 2 or ...?"

scottmcm avatar Mar 04 '25 07:03 scottmcm

I think we have to be careful here and implement some hysteresis - making the shrinking a bit less aggressive than the growth logic - so that the capacity wouldn't oscillate between 2 states around each breaking point. That kind of subtlety would also be a reason to put this in std so people don't roll their own.

TC noted in today's libs meeting that he'd call this after each work item to make the memory footprint more tightly follow what's needed, to avoid lingering bloat. So that would essentially be calling it in a loop, albeit an outer loop.

/// Beware calling this in a loop, as if that loop sometimes adds
/// things to the container, this can cause quadratic complexity
/// for things that would otherwise have been linear.

The wording doesn't seem quite right. If you only add items to a vec and don't remove anything then shrinking should be a noop and provide the same amortized O(1) behavior. So the effective behavior depends on the insert/remove/shrink patterns and I don't think any of those would be O(n²), just decaying from O(1) to O(n)?

the8472 avatar Apr 01 '25 18:04 the8472

Agreed, can someone produce an example case where this would become accidentally quadratic?

pitaj avatar Apr 01 '25 18:04 pitaj

By "quadratic" in the description I meant that the overall algorithm would be quadratic, not shrink itself, like how using reserve_exact(1) makes a push loop become quadratic even though each reserve_exact is still only linear.

So something like a loop that does push + pop + pop + shrink every iteration, maybe? Where that push forces a reallocation after the shink just left no capacity -- which I think is the right thing for shrink to do, since in the "it's after a collect that might have reused an allocation" case I wouldn't expect it to leave extra capacity if it did choose to reallocate.

scottmcm avatar Apr 02 '25 02:04 scottmcm

But in that case, it's not quadratic overall - the shrinking would happen once at most, the rest of the time the "capacity double length" condition is not met for the actual shrink.

You would have to be pretty creative, maybe something like "pop length/2+1 times, shrink, push" repeatedly.

If we want to defend against something like that, we'd want to leave some extra space behind sometimes. One option is to shrink to the smallest possible power of two. In many cases the growth of capacity follows powers of two thanks to the 2x growth factor, so the shrinking would reflect the growth.

pitaj avatar Apr 02 '25 02:04 pitaj

what happens when you have v.len() == 16, v.capacity() == 32 and you do:

for _ in 0..1000 {
    v.shrink(); // now capacity == 16
    v.push(1); // has to realloc so capacity == 32
    v.pop(); // back where we started
}

I think it might be better to shrink only if the capacity is more than twice the length, and not shrink if it's less or equal to twice the length, that way you avoid bad cases like the above.

programmerjake avatar Apr 02 '25 03:04 programmerjake

for comparison: num_bigint::BigUInt shrinks when len() < capacity() / 4: https://docs.rs/num-bigint/0.4.6/src/num_bigint/biguint.rs.html#866-868

programmerjake avatar Apr 02 '25 03:04 programmerjake

I don't disagree. To be clear, I would propose something like this

if (cap - len) > cmp::max(len, MIN_NON_ZERO_CAP) {
    self.shrink_to(len.next_power_of_two());
}

pitaj avatar Apr 02 '25 03:04 pitaj

One option is to shrink to the smallest possible power of two.

I really don't see why a power of two is special here. And that proposed implementation has the problem that len=129, cap=259 shrinks to 256, which is a waste -- it's silly to reallocate just to drop the capacity by about 1%, especially in an API that exists to avoid such unhelpful reallocations.

We can add slack (to use the word finger trees use for why they have 1..=4 element digits instead of just 2..=3) without needing to privilege any particular thresholds. (Like the BigUint example that was given.)


I think it might be better to shrink only if the capacity is more than twice the length

Yes, that's why I used > and not >= in https://github.com/rust-lang/libs-team/issues/553#issuecomment-2693106592

scottmcm avatar Apr 06 '25 23:04 scottmcm

@scottmcm

So something like a loop that does push + pop + pop + shrink every iteration, maybe? Where that push forces a reallocation after the shink just left no capacity -- which I think is the right thing for shrink to do, since in the "it's after a collect that might have reused an allocation" case I wouldn't expect it to leave extra capacity if it did choose to reallocate.

In the “it’s a collect that might have reduced an allocation” case, why not just use shrink_to_fit or shrink_to? Rust already has the tools to shrink capacity aggressively. What’s missing is a way that can’t be accidentally quadratic even in the face of growing and shrinking length. This requires coordinating with the (unspecified) default growth policy. If you only care about absolute or relative excess capacity in a collection you’re not going to grow any more, you can already implement whatever exact policy you want outside the standard library out of len(), capacity(), and shrink_to().

Edit: to be clear, the approach I’m emphasizing will also result in shrinking to 0 excess capacity whenever it does shrink. It just schedules the shrinks to avoid quadratic complexity in cases like the push; pop; pop; shrink loop.

hanna-kruppe avatar Apr 07 '25 06:04 hanna-kruppe

We discussed this during today's libs-API meeting. There was some skepticism whether we can provide a method with the right heuristics that would satisfy everyone's needs since those needs may be workload-dependent. There was also the question how common the problem is.

So we'd like to know how often this comes up, what it is used for and which heuristics were used when this was implemented manually.

the8472 avatar Apr 08 '25 19:04 the8472

So we'd like to know how often this comes up, what it is used for and which heuristics were used when this was implemented manually.

one use case that may eventually switch to using this method:

for comparison: num_bigint::BigUInt shrinks when len() < capacity() / 4: https://docs.rs/num-bigint/0.4.6/src/num_bigint/biguint.rs.html#866-868

programmerjake avatar Apr 08 '25 20:04 programmerjake

After some more discussion we'd like to know whether the goal of adding the shrink method is a) to have some "just do the right thing" sort of convenience or whether it's about b) the issue that user code shouldn't rely on/try to match implementation details - the growth strategy - which we could also expose as something like Vec::calc_capacity_growth(&self, reserve: usize) -> usize which user code could then use to calculate their own shrink_to goal.

the8472 avatar May 06 '25 14:05 the8472

It’s not clear the me that “limit excess capacity without becoming accidentally quadratic” can be implemented with just calc_capacity_growth and no further assumptions about the growth strategy. The relevant question for this kind of shrinking isn’t “how much further would the capacity grow if I reserved even more now” but rather how the vector might have grown to its current capacity and how it would grow again if it was shrunk now. If the growth strategy is more complex than “always multiply capacity by a constant factor” then you can’t conclude one from the other.

Also, keep in mind that some use cases want to do the “shrink if excessive” check very frequently, e.g., after every operation that may remove elements from the vector. A built-in implementation can make this quite efficient because it can combine knowledge of the growth strategy with the leeway it has in picking the exact threshold for shrinking. For example, if the growth strategy is “double capacity but grow by at least 4 elements” the shrink check could be something like (cap + 12) / 3 < len instead of a more complicated expression that mirrors the growth function exactly on the whole domain.

hanna-kruppe avatar May 06 '25 15:05 hanna-kruppe

We discussed this again in today's API meeting but only made a little progress. TC brought up adding additional (as in reserve(additional)) to shrink to take expected future elements into account and not shrink too tightly.

The relevant question for this kind of shrinking isn’t “how much further would the capacity grow if I reserved even more now” but rather how the vector might have grown to its current capacity and how it would grow again if it was shrunk now.

We discussed calc_growth and yeah, this was underspecified. I think the method would have to calculate the upper bound that the vec might have grown to if it arrived at the current length by repeated push or reserve(1) calls from any arbitrary prior capacity <= len.

The user then could add arbitrary margin or other heuristics on top of that upper bound to determine whether the current capacity is in excess.

Would that help?

the8472 avatar Aug 19 '25 19:08 the8472

Now on to my personal view. IMO calling any shrinkage in a loop is fundamentally different from the current growth strategy. With the amortized growth it doesn't matter too much if it's suboptimal, because the amortization means that only happens log(n) times.

With growth + shrink in a loop there's opportunity to get wrong in every loop iteration, by large amounts of memory.

Then there's the issue that realloc might end up doing any of these things:

  • update allocator metadata because it's still within the current bucket's slack
  • memcpy the whole vec including the remaining spare capacity to a different bucket
  • mremap

So the costs of shrinking and growing will also fluctuate a lot.

Another concern is that making one-shot decisions might not be optimal. With variable load one might want to hold off a few ticks before releasing memory or take the historical average into account, i.e. look at a longer time horizon. This isn't possible with a shrink() API, but could be done in user code if we provide the information what the capacity ought to be and they can then look if it has been in excess for several iterations.

Vaguely related example: our default implementation for Read::read_to_end has grown more complicated over time. In this case it's only partially about the Vec growth, but the buffers we pass to the Readers we keep track of stats over multiple iterations to make sizing decisions.

If we look at JVM heap shrink strategies: All the knobs one can think of exist.

So this looks like a wildly more complicated topic than just growing. And I'm skeptical that there's a one-size-fits-all solution.

the8472 avatar Aug 19 '25 19:08 the8472

We discussed this again in today's API meeting but only made a little progress. TC brought up adding additional (as in reserve(additional)) to shrink to take expected future elements into account and not shrink too tightly.

The relevant question for this kind of shrinking isn’t “how much further would the capacity grow if I reserved even more now” but rather how the vector might have grown to its current capacity and how it would grow again if it was shrunk now.

We discussed calc_growth and yeah, this was underspecified. I think the method would have to calculate the upper bound that the vec might have grown to if it arrived at the current length by repeated push or reserve(1) calls from any arbitrary prior capacity <= len.

The user then could add arbitrary margin or other heuristics on top of that upper bound to determine whether the current capacity is in excess.

Would that help?

I don't want to keep guessing what people in the meeting might have had in mind about how these kinds of apparently-general building blocks could be used to achieve the goal. I don't even know if we all have the same idea of what the goal is supposed to be. It would be much easier to say something meaningful about these counter-proposals if they came with a concrete usage example.

hanna-kruppe avatar Aug 19 '25 20:08 hanna-kruppe

We don't know people's individual workloads, so of course usage is going to look different, that's the point of of having a flexible API.

Well, how about this, it's supposed dynamically size a receive buffer of a stream of variable-length messages. not too aggressive in either direction.

const MIN_BUF: usize = 16*1024;
const MIN_MSG_SIZE: usize = 512;
let mut v: Vec<u8> = Vec::with_capacity(MIN_BUF);
let mut avg = MIN_MSG_SIZE;
loop {
    read_at_least_into_buffer(&sock, &mut v, MIN_MSG_SIZE);
    // pops off some messages from a vec, maybe not all, updates average message size
    process_messages(&mut v, &mut avg);


    let msg_sz = MIN_MSG_SIZE.max(avg);
    // calculates the upper bound of the capacity that the vec might naturally have grown to given current length + additional
    let cap_bound = v.growth_bound(msg_sz);
    // add custom factor to relax excess further, 1 should still be stable-ish because it includes reservation + upper bound
    // but why dance close to the edge of stability?
    if v.capacity() > cap_bound * 2 {
         // shrink, but not beyond our minimum size
         v.shrink_to(MIN_BUF.max(cap_bound));
    } else {
         v.reserve(msg_sz);
    }
}

Not based on any real project, just stitching together stuff I've done with various network protocols, file copiers etc.

the8472 avatar Aug 19 '25 21:08 the8472

I realized I never explicitly answered this question from before:

After some more discussion we'd like to know whether the goal of adding the shrink method is a) to have some "just do the right thing" sort of convenience or whether it's about b) the issue that user code shouldn't rely on/try to match implementation details - the growth strategy

I don't think option b) is one that needs addressing. The strategies for growing and shrinking need to be calibrated together to achieve good results for the expected usage pattern. Trying to do one without knowing anything about the other won't lead anywhere good (if you do get good results, it'll be because you've accidentally tailored the shrinking policy to the specifics of the current growth policy). Instead, everyone who has a specific strategy in mind can achieve it by wrapping Vec such that the default growth strategy never kicks in, and instead all capacity changes are done through reserve_exact / shrink_to. Or rather, there should be a shrink_to_exact method that gives as much control about the new capacity as reserve_exact does.

My motivation is a) but I wouldn't call it only a convenience. When I have a vector that only grows until it's dropped, std gives me amortized constant time (per element added) for push, extend, etc. as well as an asymptotic bound on space overhead vs. reserving exactly the right capacity. But if I have a long-lived vector that grows and shrinks drastically over time, I'd prefer to bound excess capacity at any point in time, without sacrificing time complexity of push and pop and other such methods. Currently, std only gives me three options:

  • Don't do anything and accept that capacity will be based on the maximum length the vector had at any point in time. This is generally what I do but it's just refusing to solve the problem because there's no better options.
  • Give up on amortized constant time by shrinking the vector whenever I see fit, (necessarily) without regard to matching the growth policy. I almost never do this because the prospect of my program becoming an accidentally quadratic tumblr post after std tweaks its growth policy worries me even more than the previous point.
  • Completely eschew the default growth policy and re-implement both growth and shrinking myself (see first paragraph). This is a ton of work and pretty all-or-nothing: if I miss a single place in my code where the default growth policy can kick in, I'm not actually achieving what I wanted to.

All of these options suck. You're right that there's also a myriad combinations of specific growth + shrink policies that someone could want. Vec already gives the low level tools for carefully implementing those policies, except for the (easily rectified) lack of shrink_to_exact. So the goal here isn't a one-size-fits-all solution. It's to provide one solution that has the right asymptotic complexity (time & space), in the same way we most people just use reserve and with_capacity, not reserve_exact, despite the possibility of.

hanna-kruppe avatar Aug 19 '25 21:08 hanna-kruppe

Or rather, there should be a shrink_to_exact

shrink_to does not apply additional heuristics, it passes the request to the allocator. So it's like with_capacity() than reserve(). I guess that's not explicitly guaranteed though. There have been some piecemeal PRs to add some guarantees here and there.

the8472 avatar Aug 19 '25 21:08 the8472

[...] without sacrificing time complexity of push and pop and other such methods

It's to provide one solution that has the right asymptotic complexity (time & space)

Hrrm, would doing the obvious thing given the current implementation - growth to x2, shrink if above x4 - preserve amortized O(1) without any pathological sequence of push, pop and shrink being possible? I.e. O(n²) would only occur in if one added truncate or reserve(n > 1) to the mix?

the8472 avatar Aug 19 '25 21:08 the8472

Yes, it would (as already discussed previously). I don't think truncate is really problematic, either. It's too late here for me to attempt a proof, but intuitively the problematic cases are when you shrink capacity then quickly grow it again to ca. the original capacity without enough intervening work to amortize the grow/shrink overhead. But growing this much again requires O(cap before shrink) work pushing new elements after the truncate, so the O(cap after shrink) work of copying elements during truncate can be amortized over those pushes.

hanna-kruppe avatar Aug 19 '25 22:08 hanna-kruppe

Ok, I can see how that would be useful on its own.

Then I'm still wondering how we can do better. Changing it to shrink(additional) that TC brought up seems like a general improvement.

And giving the user the number and letting them decide whether to apply it would also provide more flexibility, e.g. to keep a minimum buffer size... 🤔 well additional would also fulfill that role

the8472 avatar Aug 19 '25 22:08 the8472

I don’t know of any use case for such generalizations. I suspect that if you’re in a situation where you’d pass something other than 1 as additional, you’re probably better off using shrink_to or something more bespoke. Accepting an arbitrary usize also means it’ll have to be more cautious around overflow than shrink_if_excessive, which may make it less appealing to call in a tight loop where it’ll often be a no-op.

hanna-kruppe avatar Aug 20 '25 08:08 hanna-kruppe