rav1e icon indicating copy to clipboard operation
rav1e copied to clipboard

doctests break on rust 1.57.0 (release build)

Open Mic92 opened this issue 3 years ago • 15 comments

Describe the bug

Two of 4 tests fail after upgrading from rust 1.56.0 to rust 1.57.0 in https://github.com/NixOS/nixpkgs/pull/148358#issuecomment-986081970

To Reproduce

$ cargo test --release
failures:
     src/api/context.rs - api::context::Context<T>::receive_packet (line 204)
     src/api/context.rs - api::context::Context<T>::receive_packet (line 229)
 test result: FAILED. 4 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out; finished in 14.40s

Required Information Toolchain (if is a build problem):

$ cargo --version -> 1.57.0
$ rustc --version -> 1.57.0
$ nasm --version # if on x86_64 -> 2.15.05

Version:

$ rav1e --version -> 0.5.0

Operating system:

$ uname -a
Linux turingmachine 5.15.6-zen2 #1-NixOS ZEN SMP Tue Jan 1 00:00:00 UTC 1980 x86_64 GNU/Linux

Mic92 avatar Dec 05 '21 14:12 Mic92

I've attempted this on Arch Linux, with the same rustc and nasm versions, and all tests pass in my case. So it seems like this is probably specific to NixOS.

shssoichiro avatar Dec 05 '21 21:12 shssoichiro

I cannot reproduce it on MacOS as well.

lu-zero avatar Dec 05 '21 21:12 lu-zero

For me (also NixOS) cargo test --release fails while cargo test works.

--release:

[nix-shell:~/dev/git/rav1e]$ cargo test --release --doc
    Finished release [optimized + debuginfo] target(s) in 0.04s
   Doc-tests rav1e

running 6 tests
test src/lib.rs - version (line 312) ... ok
test src/api/context.rs - api::context::Context<T>::new_frame (line 41) ... ok
test src/api/config/mod.rs - api::config::Config::new_context (line 260) ... ok
test src/api/context.rs - api::context::Context<T>::send_frame (line 79) ... FAILED
test src/api/context.rs - api::context::Context<T>::receive_packet (line 204) ... FAILED
test src/api/context.rs - api::context::Context<T>::receive_packet (line 229) ... FAILED

failures:

---- src/api/context.rs - api::context::Context<T>::send_frame (line 79) stdout ----
Test executable failed (terminated by signal).

---- src/api/context.rs - api::context::Context<T>::receive_packet (line 204) stdout ----
Test executable failed (terminated by signal).

---- src/api/context.rs - api::context::Context<T>::receive_packet (line 229) stdout ----
Test executable failed (terminated by signal).


failures:
    src/api/context.rs - api::context::Context<T>::receive_packet (line 204)
    src/api/context.rs - api::context::Context<T>::receive_packet (line 229)
    src/api/context.rs - api::context::Context<T>::send_frame (line 79)

test result: FAILED. 3 passed; 3 failed; 0 ignored; 0 measured; 0 filtered out; finished in 5.90s

error: test failed, to rerun pass '--doc'

no --release:

[nix-shell:~/dev/git/rav1e]$ cargo test --doc
    Finished test [optimized + debuginfo] target(s) in 0.04s
   Doc-tests rav1e

running 6 tests
test src/lib.rs - version (line 312) ... ok
test src/api/config/mod.rs - api::config::Config::new_context (line 260) ... ok
test src/api/context.rs - api::context::Context<T>::new_frame (line 41) ... ok
test src/api/context.rs - api::context::Context<T>::send_frame (line 79) ... ok
test src/api/context.rs - api::context::Context<T>::receive_packet (line 229) ... ok
test src/api/context.rs - api::context::Context<T>::receive_packet (line 204) ... ok

test result: ok. 6 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 12.95s

Do you know if I can extract a bit more info what test actually does?

trofi avatar Dec 05 '21 22:12 trofi

The test is supposed to be something more or less similar to #2852 but since the tests pass there, I assume rustdoc is doing something different.

lu-zero avatar Dec 06 '21 06:12 lu-zero

Confirmed to fail similarly on nightly.

lu-zero avatar Dec 06 '21 06:12 lu-zero

I see same behaivoir on Alpine Linux for rav1e 0.5.1:

test src/api/context.rs - api::context::Context<T>::send_frame (line 79) ... FAILED
856test src/api/context.rs - api::context::Context<T>::receive_packet (line 204) ... FAILED
857test src/api/context.rs - api::context::Context<T>::receive_packet (line 229) ... FAILED
858failures:
859---- src/api/context.rs - api::context::Context<T>::send_frame (line 79) stdout ----
860Test executable failed (terminated by signal).
861---- src/api/context.rs - api::context::Context<T>::receive_packet (line 204) stdout ----
862Test executable failed (terminated by signal).
863---- src/api/context.rs - api::context::Context<T>::receive_packet (line 229) stdout ----
864Test executable failed (terminated by signal).
865failures:
866    src/api/context.rs - api::context::Context<T>::receive_packet (line 204)
867    src/api/context.rs - api::context::Context<T>::receive_packet (line 229)
868    src/api/context.rs - api::context::Context<T>::send_frame (line 79)
869test result: FAILED. 3 passed; 3 failed; 0 ignored; 0 measured; 0 filtered out; finished in 18.75s
870error: test failed, to rerun pass '--doc'

https://gitlab.alpinelinux.org/otlabs/aports/-/jobs/563611#L855

OTLabs avatar Dec 15 '21 02:12 OTLabs

I also see the same behavior on GNU Guix:

test api::test::lookahead_size_properly_bounded_16 ... ok

test result: ok. 394 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.16s

     Running unittests (target/release/deps/rav1e-13a050c7fe75de8d)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests rav1e

running 6 tests
test src/lib.rs - version (line 312) ... ok
test src/api/config/mod.rs - api::config::Config::new_context (line 260) ... ok
test src/api/context.rs - api::context::Context<T>::new_frame (line 41) ... ok
test src/api/context.rs - api::context::Context<T>::send_frame (line 79) ... FAILED
test src/api/context.rs - api::context::Context<T>::receive_packet (line 229) ... FAILED
test src/api/context.rs - api::context::Context<T>::receive_packet (line 204) ... FAILED

failures:

---- src/api/context.rs - api::context::Context<T>::send_frame (line 79) stdout ----
Test executable failed (terminated by signal).

---- src/api/context.rs - api::context::Context<T>::receive_packet (line 229) stdout ----
Test executable failed (terminated by signal).

---- src/api/context.rs - api::context::Context<T>::receive_packet (line 204) stdout ----
Test executable failed (terminated by signal).


failures:
    src/api/context.rs - api::context::Context<T>::receive_packet (line 204)
    src/api/context.rs - api::context::Context<T>::receive_packet (line 229)
    src/api/context.rs - api::context::Context<T>::send_frame (line 79)

test result: FAILED. 3 passed; 3 failed; 0 ignored; 0 measured; 0 filtered out; finished in 7.77s

error: test failed, to rerun pass '--doc'
error: in phase 'check': uncaught exception:
%exception #<&invoke-error program: "cargo" arguments: ("test" "--release") exit-status: 101 term-signal: 
phase `check' failed after 46.1 seconds
command "cargo" "test" "--release" failed with status 101

Attached is the full build log.
j2dqq5fa664kv7qdibxxaxphm39vp3-rav1e-0.5.1.drv.txt.gz

Apteryks avatar Jan 02 '22 23:01 Apteryks

The problem seems somehow in rustdoc, I hadn't had time to dig further.

lu-zero avatar Jan 02 '22 23:01 lu-zero

From what I understand rustdoc builds standalone executables and runs them. I was not able to pass --persist-doctests . nicely and ended up plugging a delay to catch the test with gdb interactively:

--- a/src/api/context.rs
+++ b/src/api/context.rs
@@ -78,8 +78,10 @@ impl<T: Pixel> Context<T> {
   ///
   /// ```
   /// use rav1e::prelude::*;
+  /// use std::{thread, time};
   ///
   /// # fn main() -> Result<(), Box<dyn std::error::Error>> {
+  /// thread::sleep(time::Duration::from_millis(1000 * 30));
   /// let cfg = Config::default();
   /// let mut ctx: Context<u8> = cfg.new_context().unwrap();
   /// let f1 = ctx.new_frame();
$ RUSTDOCFLAGS='-C debuginfo=2' cargo test --release --doc --verbose 'api::context::Context<T>::send_frame'
...
$ gdb -p $pid
(gdb) continue
Continuing.
[New Thread 0x7f4cf799e640 (LWP 503344)]
[New Thread 0x7f4cf779a640 (LWP 503345)]

Thread 2 "rust_out" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f4cf799e640 (LWP 503344)]
0x00007f4cf799d9d0 in ?? ()
(gdb) bt
#0  0x00007f4cf799d9d0 in ?? ()
#1  0x00007f4cf799e5f8 in ?? ()
#2  0x00007f4cf799da80 in ?? ()
#3  0x000055b1d909d861 in rayon_core::registry::WorkerThread::set_current (thread=0x7f4cf799d600)
    at /home/slyfox/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-core-1.9.1/src/registry.rs:636
#4  rayon_core::registry::main_loop (registry=..., index=0, worker=...)
    at /home/slyfox/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-core-1.9.1/src/registry.rs:807
#5  rayon_core::registry::ThreadBuilder::run (self=...)
    at /home/slyfox/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-core-1.9.1/src/registry.rs:55
#6  0x000055b1d90a41dd in rayon_core::registry::{impl#2}::spawn::{closure#0} ()
    at /home/slyfox/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-core-1.9.1/src/registry.rs:100
#7  std::sys_common::backtrace::__rust_begin_short_backtrace<rayon_core::registry::{impl#2}::spawn::{closure#0}, ()> (f=...)
    at /build/rustc-1.57.0-src/library/std/src/sys_common/backtrace.rs:123
#8  0x000055b1d90975dc in std::thread::{impl#0}::spawn_unchecked::{closure#1}::{closure#0}<rayon_core::registry::{impl#2}::spawn::{closure#0}, ()> () at /build/rustc-1.57.0-src/library/std/src/thread/mod.rs:483
#9  core::panic::unwind_safe::{impl#23}::call_once<(), std::thread::{impl#0}::spawn_unchecked::{closure#1}::{closure#0}> (self=...,
    _args=<optimized out>) at /build/rustc-1.57.0-src/library/core/src/panic/unwind_safe.rs:271
#10 0x000055b1d90a11cf in std::panicking::try::do_call<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked::{closure#1}::{closure#0}>, ()> (data=<optimized out>) at /build/rustc-1.57.0-src/library/std/src/panicking.rs:403
#11 std::panicking::try<(), core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked::{closure#1}::{closure#0}>>
    (f=<error reading variable: Cannot access memory at address 0x0>) at /build/rustc-1.57.0-src/library/std/src/panicking.rs:367
#12 0x000055b1d90a42d0 in std::panic::catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked::{closure#1}::{closure#0}>, ()> (f=...) at /build/rustc-1.57.0-src/library/std/src/panic.rs:133
#13 0x000055b1d90960cb in std::thread::{impl#0}::spawn_unchecked::{closure#1}<rayon_core::registry::{impl#2}::spawn::{closure#0}, ()> ()
    at /build/rustc-1.57.0-src/library/std/src/thread/mod.rs:482
#14 core::ops::function::FnOnce::call_once<std::thread::{impl#0}::spawn_unchecked::{closure#1}, ()> ()
    at /build/rustc-1.57.0-src/library/core/src/ops/function.rs:227
#15 0x000055b1d91d65a5 in std::sys::unix::thread::Thread::new::thread_start ()
#16 0x00007f4cf83fdd40 in start_thread () from /nix/store/s9qbqh7gzacs7h68b2jfmn9l6q4jwfjz-glibc-2.33-59/lib/libpthread.so.0
#17 0x00007f4cf81e443f in clone () from /nix/store/s9qbqh7gzacs7h68b2jfmn9l6q4jwfjz-glibc-2.33-59/lib/libc.so.6

trofi avatar Jan 04 '22 23:01 trofi

Specifically rayon-core-1.9.1/src/registry.rs:

    /// Sets `self` as the worker thread index for the current thread.
    /// This is done during worker thread startup.
    unsafe fn set_current(thread: *const WorkerThread) {
        WORKER_THREAD_STATE.with(|t| {
            assert!(t.get().is_null());
            t.set(thread);
        });
    }
...
unsafe fn main_loop(worker: Worker<JobRef>, registry: Arc<Registry>, index: usize) {
    let worker_thread = &WorkerThread {
        worker,
        fifo: JobFifo::new(),
        index,
        rng: XorShift64Star::new(),
        registry: registry.clone(),
    };
    WorkerThread::set_current(worker_thread);
...

Does rust guarantee that raw worker_thread will point to fully constructed object? Or could actual initialization move downward a bit?

trofi avatar Jan 05 '22 00:01 trofi

Rust should guarantee that the worker thread is initialized before it is used. However, that is inside of an unsafe function, so many guarantees are no longer valid. One of the fields on the worker thread could be uninitialized.

I wonder if this is worth bringing over to the rayon devs... 🤔 I think we'll probably want to look a little closer just to verify if the bug is in their code before we do that.

shssoichiro avatar Jan 05 '22 04:01 shssoichiro

Are you willing to open an issue on their side? It is really fishy.

lu-zero avatar Jan 05 '22 10:01 lu-zero

Are you willing to open an issue on their side? It is really fishy.

Sure! Took me a while to extract minimal reproducer. I think it's hard to extract as is because rustdoc does not pass opt-level implied by --release, but does pass lto = "thin" specificed by Cargo.toml.

Upstream report: https://github.com/rayon-rs/rayon/issues/911

trofi avatar Jan 10 '22 19:01 trofi

Looks like this was fixed in https://github.com/rust-lang/rust/issues/92869 and will be released in rustc 1.60.

shssoichiro avatar Apr 02 '22 03:04 shssoichiro