redo-rs icon indicating copy to clipboard operation
redo-rs copied to clipboard

Jobserver assertion failing

Open ISSOtm opened this issue 9 months ago • 2 comments

I've had a EnsureTokenState::New future with 2 tokens, triggering an assertion failure. I've added a little logging, and it paints the following picture:

  1. A task (obj/lib/vblank.asm.o) ends up collecting another task's “done” token(?), in this case obj/main.asm.o
  2. The next iteration of the loop also picks up the task's own token, raising the number of tokens to 2
  3. ensure_token_or_cheat is called (through polling, I think?), which calls ensure_token, creating a new EnsureToken that has 2 tokens (via state.clone()) but nonetheless the default EnsureTokenState of New, which trips the original assertion.

(And I mean, this reproduces somewhat inconsistently on the current master. This might be a race condition?) I added a new assertion inside of ensure_token_or_cheat, which is why the backtrace below is halfway in the above process.

job#28231: (obj/lib/vblank.asm.o) waiting for tokens...
job#28231: 0,0 token_fds=(100, 101); jfds=[54, 52, 60, 53, 56, 58, 50, 55, 59, 57]; token_wakers=[18]; r=[50, 52, 53, 54, 55, 56, 57, 58, 59, 60, 100]
job#28231: readable: [53, 100]
job#28231: done: obj/main.asm.o
job#28231: 1,0 -> release(0)
job#28231: done1: rv=0
job#28231: read a token ([116]), now at 2.
job#28231: (obj/lib/vblank.asm.o) got a token, now at 2.
thread 'main' panicked at /tmp/redo-rs/src/jobserver.rs:761:13:
assertion failed: self.state.borrow().my_tokens <= 1
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/panicking.rs:665:5
   1: core::panicking::panic_fmt
             at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panicking.rs:76:14
   2: core::panicking::panic
             at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panicking.rs:148:5
   3: redo::jobserver::JobServerHandle::ensure_token_or_cheat::{{closure}}
             at /tmp/redo-rs/src/jobserver.rs:761:13

...I just don't know enough about this program to be able to tell what the proper fix is.

ISSOtm avatar Feb 03 '25 19:02 ISSOtm

Hm. Are you able to reproduce this with apenwarr's redo as well? I tried to follow that code as closely as I could, but it's very possible I messed something up.

(I would like to fix this, but I don't have much bandwidth to spend on this project at the moment, I'm afraid.)

zombiezen avatar Feb 17 '25 19:02 zombiezen

I just tried version 0.42.d a couple of times, and, nope.

I wonder if it could be related to the fact that some of the recipes use the jobserver directly (here, Cargo)? Neither of the two jobs involve it, but I believe the Cargo job is active while the bug triggers.

ISSOtm avatar Feb 17 '25 23:02 ISSOtm