rust Update VecDeque implementation to use head+len instead of head+tail

(See #99805)

This changes alloc::collections::VecDeque's internal representation from using head and tail indices to using a head index and a length field. It has a few advantages over the current design:

It allows the buffer to be of length 0, which means the VecDeque::new new longer has to allocate and could be changed to a const fn
It allows the VecDeque to fill the buffer completely, unlike the old implementation, which always had to leave a free space
It removes the restriction for the size to be a power of two, allowing it to properly shrink_to_fit, unlike the old VecDeque
The above points also combine to allow the Vec<T> -> VecDeque<T> conversion to be very cheap and guaranteed O(1). I mention this in the From<Vec<T>> impl, but it's not a strong guarantee just yet, as that would likely need some form of API change proposal.

All the tests seem to pass for the new VecDeque, with some slight adjustments.

r? @scottmcm

Oct 12 '22 23:10 Sp00ph

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @scottmcm (or someone else) soon.

Please see the contribution instructions for more information.

Oct 12 '22 23:10 rust-highfive

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

Stabilizing library features
Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
Changing public documentation in ways that create new stability guarantees
Changing observable runtime behavior of library APIs

Oct 12 '22 23:10 rustbot

I'm trying to run the x.py test tidy --bless command but it's taking forever

Oct 12 '22 23:10 Sp00ph

Is there any better way to run the tidy tool just for alloc, since I didn't change any other code?

Oct 12 '22 23:10 Sp00ph

Ok this error is really weird, I got it locally too but didn't really think too much of it as I also got some weird errors due to winapi functions failing. I don't really know what this error is about and it doesn't give any stacktrace to help identify the issue unfortunately.

Oct 13 '22 00:10 Sp00ph

Also, for some reason it's telling me that collectiontests is somehow causing a stack buffer overrun, and I'd be quite surprised if that came from my VecDeque implementation.

Oct 13 '22 00:10 Sp00ph

The job mingw-check failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

configure: rust.debug-assertions := True
configure: rust.overflow-checks := True
configure: llvm.assertions      := True
configure: dist.missing-tools   := True
configure: build.configure-args := ['--enable-sccache', '--disable-manage-submodu ...
configure: writing `config.toml` in current directory
configure: 
configure: run `python /checkout/x.py --help`
Attempting with retry: make prepare
---
........................................................................................ 264/651
........................................................................................ 352/651
........................................................................................ 440/651
........................................................................................ 528/651
..................................................F...................................F. 616/651
thread panicked while panicking. aborting.
error: test failed, to rerun pass `-p alloc --test collectionstests`
Caused by:
Caused by:
  process didn't exit successfully: `/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/collectionstests-77025343f1efe33a --quiet` (signal: 6, SIGABRT: process abort signal)
....FF.............F.....FBuild completed unsuccessfully in 0:01:09

Oct 13 '22 01:10 rust-log-analyzer

When I was trying to rewrite VecDeque recently, the tests were segfaulting all the time :smile: The problem is that the VecDeque itself is used for the test infrastructure. It would be nice if we could use stage0 stdlib to run the tests, but test stage1 code.

Oct 13 '22 10:10 Kobzol

Yea I read that somewhere that VecDeque is used in the testing framework, but the weird thing is that all the tests in the first test suite pass just fine, and also i don't see how my code could cause a stack buffer overrun. The only part i can think of where it reads memory from the stack directly would be the From<[T; N]> impl, and from what I can tell that shouldn't be able to overrun the buffer.

Oct 13 '22 10:10 Sp00ph

You can enable debug asserts for the standard library in config.toml and add them to your implementation, that might help uncovering bugs.

but the weird thing is that all the tests in the first test suite pass just fine

We don't have 100% test coverage, so it might be some edge-case

Oct 13 '22 10:10 the8472

I think I finally got it to work, sorry for all that.

Oct 13 '22 11:10 Sp00ph

I'm not familiar with the format of these .stderr files, what do I need to do to make it not fail there?

Oct 13 '22 12:10 Sp00ph

See https://rustc-dev-guide.rust-lang.org/tests/running.html#editing-and-updating-the-reference-files

...you can pass --bless to the test subcommand. E.g. if some tests in src/test/ui are failing, you can run
./x.py test src/test/ui --bless

Oct 13 '22 12:10 ChrisDenton

Oh boy, doing that does an LLVM build for me, which my laptop doesn't have the resources for.

Edit: nevermind, figured out you can just set download-ci-llvm=true

Oct 13 '22 12:10 Sp00ph

set download-ci-llvm = "if-available" in config.toml

Oct 13 '22 12:10 the8472

Ugh, this is kinda frustrating. Problem is, I only have access to a pretty slow laptop, and so everything takes forever. Running just the ui tests took like 30 minutes, not including the compile time.

Oct 13 '22 14:10 Sp00ph

Ugh, this is kinda frustrating. Problem is, I only have access to a pretty slow laptop, and so everything takes forever. Running just the ui tests took like 30 minutes, not including the compile time.

Not sure about the current circumstance of this PR, but you can try to only run previously failing test locally or try to get the CI to help you run the test (since its much faster, but still takes 10+ mins to finishing the first run of test and 40+ mins to complete the CI).

Oct 13 '22 14:10 Rageking8

Phew, finally got the checks passing. I'm very sorry for all the hassle, this is my first time trying to contribute to the stdlib and it probably shows.

Oct 13 '22 16:10 Sp00ph

That's fine, there will be many more failures (those are not even all the tests that have to pass :grin: ). Could you share the results of stdlib (or any other) microbenchmarks of the old and the new VecDeque on your machine?

Oct 13 '22 16:10 Kobzol

Running x.py bench --stage 0 library/alloc --test-args vec_deque gives the following results for the old implementation:

test collections::vec_deque::tests::bench_pop_back_100       ... bench:         259 ns/iter (+/- 4)
test collections::vec_deque::tests::bench_pop_front_100      ... bench:         252 ns/iter (+/- 67)
test collections::vec_deque::tests::bench_push_back_100      ... bench:         288 ns/iter (+/- 5)
test collections::vec_deque::tests::bench_push_front_100     ... bench:         291 ns/iter (+/- 388)
test collections::vec_deque::tests::bench_retain_half_10000  ... bench:     574,685 ns/iter (+/- 38,841)
test collections::vec_deque::tests::bench_retain_odd_10000   ... bench:     570,027 ns/iter (+/- 799,114)
test collections::vec_deque::tests::bench_retain_whole_10000 ... bench:     997,678 ns/iter (+/- 508,766)

test vec_deque::bench_extend_bytes                       ... bench:          20 ns/iter (+/- 35)
test vec_deque::bench_extend_chained_bytes               ... bench:       1,619 ns/iter (+/- 45)
test vec_deque::bench_extend_chained_trustedlen          ... bench:       1,676 ns/iter (+/- 58)
test vec_deque::bench_extend_trustedlen                  ... bench:       1,064 ns/iter (+/- 9)
test vec_deque::bench_extend_vec                         ... bench:          99 ns/iter (+/- 155)
test vec_deque::bench_from_array_1000                    ... bench:         489 ns/iter (+/- 246)
test vec_deque::bench_grow_1025                          ... bench:       4,082 ns/iter (+/- 48)
test vec_deque::bench_iter_1000                          ... bench:         638 ns/iter (+/- 1,029)
test vec_deque::bench_mut_iter_1000                      ... bench:         637 ns/iter (+/- 6)
test vec_deque::bench_new                                ... bench:          61 ns/iter (+/- 102)
test vec_deque::bench_try_fold                           ... bench:         375 ns/iter (+/- 651)

custom-bench vec_deque_append 249800 ns/iter

And these are the results for the new implementation:

test collections::vec_deque::tests::bench_pop_back_100       ... bench:          89 ns/iter (+/- 28)
test collections::vec_deque::tests::bench_pop_front_100      ... bench:         169 ns/iter (+/- 0)
test collections::vec_deque::tests::bench_push_back_100      ... bench:         208 ns/iter (+/- 2)
test collections::vec_deque::tests::bench_push_front_100     ... bench:         258 ns/iter (+/- 313)
test collections::vec_deque::tests::bench_retain_half_10000  ... bench:     491,300 ns/iter (+/- 201,210)
test collections::vec_deque::tests::bench_retain_odd_10000   ... bench:     569,320 ns/iter (+/- 170,957)
test collections::vec_deque::tests::bench_retain_whole_10000 ... bench:     596,310 ns/iter (+/- 98,938)

test vec_deque::bench_extend_bytes                       ... bench:       1,055 ns/iter (+/- 1,368)
test vec_deque::bench_extend_chained_bytes               ... bench:       1,481 ns/iter (+/- 1,934)
test vec_deque::bench_extend_chained_trustedlen          ... bench:       1,069 ns/iter (+/- 16)
test vec_deque::bench_extend_trustedlen                  ... bench:       1,053 ns/iter (+/- 15)
test vec_deque::bench_extend_vec                         ... bench:          91 ns/iter (+/- 176)
test vec_deque::bench_from_array_1000                    ... bench:         489 ns/iter (+/- 17)
test vec_deque::bench_grow_1025                          ... bench:       4,024 ns/iter (+/- 5,947)
test vec_deque::bench_iter_1000                          ... bench:         370 ns/iter (+/- 4)
test vec_deque::bench_mut_iter_1000                      ... bench:         828 ns/iter (+/- 1,236)
test vec_deque::bench_new                                ... bench:           2 ns/iter (+/- 0)
test vec_deque::bench_try_fold                           ... bench:         531 ns/iter (+/- 4)

custom-bench vec_deque_append 253700 ns/iter

Some of these seem a bit weird to me, like extend_bytes only taking 20ns in the old implementation and iter_mut being almost 3x slower than iter in the new implementation, despite their code being largely identical. I guess that's just due to background noise on my laptop though, as I haven't done any benchmarking specific setup other than closing unnecessary applications. Over all, I don't think there's any unacceptable performance regressions, and new being practically instant is a nice bonus.

Oct 13 '22 17:10 Sp00ph

Some of these seem a bit weird to me

Std benches are finnicky because they measure wall-time. Especially if you're on a laptop. This pending update to the std dev guide might help

like extend_bytes only taking 20ns in the old implementation

It's copying 512 bytes into a pre-allocated vecdeque, that's 8 cache-lines worth of data and fits into the L1 cache. 20ns seems like the right order of magnitude, especially considering that a processor can execute a handful of instruction every nanosecond. 1000ns does not.

Oct 13 '22 17:10 the8472

Results from my laptop (Zen 3):

master
collections::vec_deque::tests::bench_pop_back_100       ... bench:         187 ns/iter (+/- 5)
test collections::vec_deque::tests::bench_pop_front_100      ... bench:         165 ns/iter (+/- 2)
test collections::vec_deque::tests::bench_push_back_100      ... bench:         165 ns/iter (+/- 6)
test collections::vec_deque::tests::bench_push_front_100     ... bench:         165 ns/iter (+/- 2)
test collections::vec_deque::tests::bench_retain_half_10000  ... bench:     381,092 ns/iter (+/- 5,772)
test collections::vec_deque::tests::bench_retain_odd_10000   ... bench:     383,285 ns/iter (+/- 10,536)
test collections::vec_deque::tests::bench_retain_whole_10000 ... bench:     355,320 ns/iter (+/- 7,691)

test vec_deque::bench_extend_bytes                       ... bench:          10 ns/iter (+/- 0)
test vec_deque::bench_extend_chained_bytes               ... bench:       1,661 ns/iter (+/- 59)
test vec_deque::bench_extend_chained_trustedlen          ... bench:       1,617 ns/iter (+/- 58)
test vec_deque::bench_extend_trustedlen                  ... bench:       1,238 ns/iter (+/- 51)
test vec_deque::bench_extend_vec                         ... bench:          26 ns/iter (+/- 1)
test vec_deque::bench_from_array_1000                    ... bench:         192 ns/iter (+/- 7)
test vec_deque::bench_grow_1025                          ... bench:       1,986 ns/iter (+/- 90)
test vec_deque::bench_iter_1000                          ... bench:         465 ns/iter (+/- 17)
test vec_deque::bench_mut_iter_1000                      ... bench:         467 ns/iter (+/- 10)
test vec_deque::bench_new                                ... bench:          15 ns/iter (+/- 0)
test vec_deque::bench_try_fold                           ... bench:          34 ns/iter (+/- 0)

custom-bench vec_deque_append 311562 ns/iter

This PR
collections::vec_deque::tests::bench_pop_back_100       ... bench:         148 ns/iter (+/- 1)
test collections::vec_deque::tests::bench_pop_front_100      ... bench:         146 ns/iter (+/- 1)
test collections::vec_deque::tests::bench_push_back_100      ... bench:         169 ns/iter (+/- 4)
test collections::vec_deque::tests::bench_push_front_100     ... bench:         194 ns/iter (+/- 4)
test collections::vec_deque::tests::bench_retain_half_10000  ... bench:     255,548 ns/iter (+/- 5,597)
test collections::vec_deque::tests::bench_retain_odd_10000   ... bench:     257,342 ns/iter (+/- 8,091)
test collections::vec_deque::tests::bench_retain_whole_10000 ... bench:     231,784 ns/iter (+/- 6,616)

test vec_deque::bench_extend_bytes                       ... bench:         695 ns/iter (+/- 16)
test vec_deque::bench_extend_chained_bytes               ... bench:         813 ns/iter (+/- 32)
test vec_deque::bench_extend_chained_trustedlen          ... bench:         735 ns/iter (+/- 30)
test vec_deque::bench_extend_trustedlen                  ... bench:          15 ns/iter (+/- 0)
test vec_deque::bench_extend_vec                         ... bench:          27 ns/iter (+/- 0)
test vec_deque::bench_from_array_1000                    ... bench:         195 ns/iter (+/- 5)
test vec_deque::bench_grow_1025                          ... bench:       2,074 ns/iter (+/- 418)
test vec_deque::bench_iter_1000                          ... bench:         328 ns/iter (+/- 18)
test vec_deque::bench_mut_iter_1000                      ... bench:         327 ns/iter (+/- 5)
test vec_deque::bench_new                                ... bench:           2 ns/iter (+/- 0)
test vec_deque::bench_try_fold                           ... bench:          34 ns/iter (+/- 2)

custom-bench vec_deque_append 144851 ns/iter

(I try to reduce variance by turning off hyper-threading, turbo-boost etc., but it's still quite noisy, as usually). It looks really good! :rocket:

I should have tried this simple representation when I was trying to do what you have done, I tried the "unmasked indices" approach from https://www.snellman.net/blog/archive/2016-12-13-ring-buffers/, but it was super complicated.

Oct 13 '22 17:10 Kobzol

like extend_bytes only taking 20ns in the old implementation

It's copying 512 bytes into a pre-allocated vecdeque, that's 8 cache-lines worth of data and fits into the L1 cache. 20ns seems like the right order of magnitude, especially considering that a processor can execute a handful of instruction every nanosecond. 1000ns does not.

Seems like I messed up something then, if I had to guess it's a worse SpecExtend impl, I can look into it later.

Oct 13 '22 17:10 Sp00ph

Alrighty, the new numbers on my laptop are looking like this:

test collections::vec_deque::tests::bench_pop_back_100       ... bench:          89 ns/iter (+/- 108)
test collections::vec_deque::tests::bench_pop_front_100      ... bench:         169 ns/iter (+/- 136)
test collections::vec_deque::tests::bench_push_back_100      ... bench:         207 ns/iter (+/- 3)
test collections::vec_deque::tests::bench_push_front_100     ... bench:         253 ns/iter (+/- 278)
test collections::vec_deque::tests::bench_retain_half_10000  ... bench:     433,255 ns/iter (+/- 4,701)
test collections::vec_deque::tests::bench_retain_odd_10000   ... bench:     436,163 ns/iter (+/- 398,544)
test collections::vec_deque::tests::bench_retain_whole_10000 ... bench:     276,428 ns/iter (+/- 321,348)

test vec_deque::bench_extend_bytes                       ... bench:          17 ns/iter (+/- 23)
test vec_deque::bench_extend_chained_bytes               ... bench:       1,068 ns/iter (+/- 14)
test vec_deque::bench_extend_chained_trustedlen          ... bench:       1,069 ns/iter (+/- 16)
test vec_deque::bench_extend_trustedlen                  ... bench:       1,292 ns/iter (+/- 1,971)
test vec_deque::bench_extend_vec                         ... bench:         128 ns/iter (+/- 169)
test vec_deque::bench_from_array_1000                    ... bench:         539 ns/iter (+/- 846)
test vec_deque::bench_grow_1025                          ... bench:       3,947 ns/iter (+/- 89)
test vec_deque::bench_iter_1000                          ... bench:         426 ns/iter (+/- 626)
test vec_deque::bench_mut_iter_1000                      ... bench:         788 ns/iter (+/- 993)
test vec_deque::bench_new                                ... bench:           2 ns/iter (+/- 0)
test vec_deque::bench_try_fold                           ... bench:         607 ns/iter (+/- 194)

custom-bench vec_deque_append 238050 ns/iter

Oct 13 '22 22:10 Sp00ph

Alright, at this point I'm fairly confident in the correctness of the code, I just did a final thorough pass and looked over all of it. I'd still like to try and fuzz it against my original deque rewrite, which I had already fuzzed against the old VecDeque. Is there any way to link my locally compiled std into that program?

Oct 14 '22 14:10 Sp00ph

It would be good if you could clean up the git history (with git rebase -i). Commit messages such as "final changes hopefully" are not very helpful and can be merged into the commit that introduced the thing that needs to be fixed.

Is there any way to link my locally compiled std into that program?

./x build library/std --stage 1 will build the compiler and then use that to build std, which can then be added as rustup toolchain

Oct 14 '22 14:10 the8472

You'll also want to run the alloc tests under Miri, see #60076 and #99701 for an issue that was previously uncovered that way.

Oct 14 '22 16:10 the8472

I've been letting it fuzz for around an hour now with no errors. I can let it run for a bit longer, maybe overnight, but so far it's looking good. I'll take a look at running the tests in miri.

Oct 14 '22 16:10 Sp00ph

Is there any built-in way to run the tests under miri using some x.py command?

Oct 14 '22 16:10 Sp00ph

See #99701

Oct 14 '22 16:10 the8472

rust rust copied to clipboard

Update VecDeque implementation to use head+len instead of head+tail

rust
rust copied to clipboard