redo-rs
redo-rs copied to clipboard
Jobserver assertion failing
I've had a EnsureTokenState::New future with 2 tokens, triggering an assertion failure. I've added a little logging, and it paints the following picture:
- A task (
obj/lib/vblank.asm.o) ends up collecting another task's “done” token(?), in this caseobj/main.asm.o - The next iteration of the loop also picks up the task's own token, raising the number of tokens to 2
ensure_token_or_cheatis called (through polling, I think?), which callsensure_token, creating a newEnsureTokenthat has 2 tokens (viastate.clone()) but nonetheless the defaultEnsureTokenStateofNew, which trips the original assertion.
(And I mean, this reproduces somewhat inconsistently on the current master. This might be a race condition?)
I added a new assertion inside of ensure_token_or_cheat, which is why the backtrace below is halfway in the above process.
job#28231: (obj/lib/vblank.asm.o) waiting for tokens...
job#28231: 0,0 token_fds=(100, 101); jfds=[54, 52, 60, 53, 56, 58, 50, 55, 59, 57]; token_wakers=[18]; r=[50, 52, 53, 54, 55, 56, 57, 58, 59, 60, 100]
job#28231: readable: [53, 100]
job#28231: done: obj/main.asm.o
job#28231: 1,0 -> release(0)
job#28231: done1: rv=0
job#28231: read a token ([116]), now at 2.
job#28231: (obj/lib/vblank.asm.o) got a token, now at 2.
thread 'main' panicked at /tmp/redo-rs/src/jobserver.rs:761:13:
assertion failed: self.state.borrow().my_tokens <= 1
stack backtrace:
0: rust_begin_unwind
at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/panicking.rs:665:5
1: core::panicking::panic_fmt
at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panicking.rs:76:14
2: core::panicking::panic
at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panicking.rs:148:5
3: redo::jobserver::JobServerHandle::ensure_token_or_cheat::{{closure}}
at /tmp/redo-rs/src/jobserver.rs:761:13
...I just don't know enough about this program to be able to tell what the proper fix is.
Hm. Are you able to reproduce this with apenwarr's redo as well? I tried to follow that code as closely as I could, but it's very possible I messed something up.
(I would like to fix this, but I don't have much bandwidth to spend on this project at the moment, I'm afraid.)
I just tried version 0.42.d a couple of times, and, nope.
I wonder if it could be related to the fact that some of the recipes use the jobserver directly (here, Cargo)? Neither of the two jobs involve it, but I believe the Cargo job is active while the bug triggers.