zig icon indicating copy to clipboard operation
zig copied to clipboard

std.Progress panic: reached unreachable code

Open andrewrk opened this issue 1 year ago • 4 comments

Zig Version

0.14.0-dev.1863+8d872b018

Steps to Reproduce and Observed Behavior

It's a race condition so it's not easy to reproduce but I have seen this happen:

[nix-shell:~/src/zig/build-release]$ stage4/bin/zig build test-link -fqemu
thread 1039492 panic: reached unreachable code
/home/andy/src/zig/lib/std/debug.zig:405:14: 0x14fa9ed in assert (build)
    if (!ok) unreachable; // assertion failure
             ^
/home/andy/src/zig/lib/std/Progress.zig:309:15: 0x153c89a in init (build)
        assert(parent_ptr.* == .unused);
              ^
/home/andy/src/zig/lib/std/Progress.zig:193:28: 0x150eb95 in start (build)
                return init(free_index, parent, name, estimated_total_items);
                           ^
/home/andy/src/zig/lib/compiler/build_runner.zig:1093:42: 0x1562c6b in workerMakeOneStep (build)
    const sub_prog_node = prog_node.start(s.name, 0);
                                         ^
/home/andy/src/zig/lib/std/Thread/Pool.zig:115:39: 0x15636a1 in runFn (build)
            @call(.auto, func, closure.arguments);
                                      ^
/home/andy/src/zig/lib/std/Thread/Pool.zig:291:32: 0x16658c5 in worker (build)
            run_node.data.runFn(&run_node.data, id);
                               ^
/home/andy/src/zig/lib/std/Thread.zig:486:13: 0x15c8e9d in callFn__anon_66147 (build)
            @call(.auto, f, args);
            ^
/home/andy/src/zig/lib/std/Thread.zig:1374:30: 0x15623a7 in entryFn (build)
                return callFn(f, self.fn_args);
                             ^
/home/andy/src/zig/lib/std/os/linux/x86_64.zig:104:5: 0x15c8f31 in clone (build)
    asm volatile (
    ^
???:?:?: 0x0 in ??? (???)
error: the following build command crashed:
/home/andy/src/zig/.zig-cache/o/8dbc51016672d1ad9f5e8a59f6871761/build /home/andy/src/zig/build-release/stage4/bin/zig /home/andy/src/zig/lib /home/andy/src/zig /home/andy/src/zig/.zig-cache /home/andy/.cache/zig --seed 0xb60c09df -Z138b0b345a660505 test-link -fqemu

Expected Behavior

Don't panic.

andrewrk avatar Oct 10 '24 21:10 andrewrk

zig version
0.14.0-dev.1887+008acd054

I tried to reproduce the problem on my computer. This error occurs when multiple test commands are opened in the shell tab. bug2

dravenk avatar Oct 12 '24 07:10 dravenk

I've also just encountered this locally running test-cases.

mlugg avatar Oct 14 '24 21:10 mlugg

Please try to use text instead of images of text!

andrewrk avatar Oct 14 '24 21:10 andrewrk

text of trace above
error: thread 5468577 panic: reached unreachable code
/Users/dk/src/zig/zig/src/link/Dwarf.zig:2930:33: 0x1091e371b in updateComptimeNav (zig)
/Users/dk/src/zig/zig/src/link/Elf/ZigObject.zig:1617:64: 0x1091c70a3 in updateNav (zig)
/Users/dk/src/zig/zig/src/link/Elf.zig:2723:43: 0x108f6322f in updateNav (zig)
/Users/dk/src/zig/zig/src/link.zig:650:81: 0x108d19183 in updateNav (zig)
/Users/dk/src/zig/zig/src/Zcu/PerThread.zig:2594:21: 0x108d1882f in linkerUpdateNav (zig)
/Users/dk/src/zig/zig/src/Compilation.zig:3955:35: 0x109149d3f in processOneCodegenJob (zig)
/Users/dk/src/zig/zig/src/Compilation.zig:3935:33: 0x109149013 in codegenThread (zig)
/Users/dk/src/zig/zig/lib/std/Thread/Pool.zig:178:50: 0x10914911b in runn (zig)
/Users/dk/src/zig/zig/lib/std/Thread/Pool.zig:291:32: 0x1090f5d3f in worker (zig)
/Users/dk/src/zig/zig/lib/std/Thread.zig:486:13: 0x108ebb153 in callFn__anon_200013 (zig)
/Users/dk/src/zig/zig/lib/std/Thread.zig:755:30: 0x108cb5d3b in entryFn (zig)

nektro avatar Oct 14 '24 21:10 nektro

A clue:

WARNING: ThreadSanitizer: data race (pid=1180490)
  Write of size 1 at 0x556c60d0aa93 by thread T13:
    #0 Progress.Node.end /home/andy/src/zig/lib/std/Progress.zig:252 (zig+0xa084b24)
    #1 Compilation.workerAstGenFile /home/andy/src/zig/src/Compilation.zig:4245 (zig+0xa7cd330)
    #2 Thread.Pool.spawnWgId__anon_331201.Closure.runFn /home/andy/src/zig/lib/std/Thread/Pool.zig:178 (zig+0xb1dc915)
    #3 Thread.Pool.worker /home/andy/src/zig/lib/std/Thread/Pool.zig:291 (zig+0xa775769)
    #4 Thread.callFn__anon_201324 /home/andy/src/zig/lib/std/Thread.zig:486 (zig+0xa49ac64)
    #5 Thread.PosixThreadImpl.spawn__anon_136182.Instance.entryFn /home/andy/src/zig/lib/std/Thread.zig:755 (zig+0xa1ffe0f)

  Previous read of size 1 at 0x556c60d0aa93 by thread T16:
    #0 Progress.Node.start /home/andy/src/zig/lib/std/Progress.zig:191 (zig+0xa084458)
    #1 Compilation.workerAstGenFile /home/andy/src/zig/src/Compilation.zig:4244 (zig+0xa7cd036)
    #2 Thread.Pool.spawnWgId__anon_331201.Closure.runFn /home/andy/src/zig/lib/std/Thread/Pool.zig:178 (zig+0xb1dc915)
    #3 Thread.Pool.worker /home/andy/src/zig/lib/std/Thread/Pool.zig:291 (zig+0xa775769)
    #4 Thread.callFn__anon_201324 /home/andy/src/zig/lib/std/Thread.zig:486 (zig+0xa49ac64)
    #5 Thread.PosixThreadImpl.spawn__anon_136182.Instance.entryFn /home/andy/src/zig/lib/std/Thread.zig:755 (zig+0xa1ffe0f)

  Thread T13 (tid=1180636, running) created by main thread at:
    #0 pthread_create /home/andy/src/zig/lib/tsan/tsan_interceptors_posix.cpp:1023 (zig+0xd0fb9a3)
    #1 Thread.PosixThreadImpl.spawn__anon_136182 /home/andy/src/zig/lib/std/Thread.zig:773 (zig+0xa1ff90b)
    #2 Thread.spawn__anon_50660 /home/andy/src/zig/lib/std/Thread.zig:419 (zig+0x9fbde89)
    #3 Thread.Pool.init /home/andy/src/zig/lib/std/Thread/Pool.zig:57 (zig+0x9fbd9ce)
    #4 main.buildOutputType /home/andy/src/zig/src/main.zig:3205 (zig+0xa13ff8c)
    #5 main.mainArgs /home/andy/src/zig/src/main.zig:258 (zig+0x9f71cd3)
    #6 main.main /home/andy/src/zig/src/main.zig:199 (zig+0x9f6e625)
    #7 start.callMain /home/andy/src/zig/lib/std/start.zig:618 (zig+0x9f6e017)
    #8 start.callMainWithArgs /home/andy/src/zig/lib/std/start.zig:578 (zig+0x9f6e017)
    #9 start.main /home/andy/src/zig/lib/std/start.zig:593 (zig+0x9f6e017)
    #10 __libc_start_call_main ??:? (libc.so.6+0x2a10d) (BuildId: ddc3651a0e729783c4bc0a1ea2d04fa976c0b246)

  Thread T16 (tid=1180649, running) created by main thread at:
    #0 pthread_create /home/andy/src/zig/lib/tsan/tsan_interceptors_posix.cpp:1023 (zig+0xd0fb9a3)
    #1 Thread.PosixThreadImpl.spawn__anon_136182 /home/andy/src/zig/lib/std/Thread.zig:773 (zig+0xa1ff90b)
    #2 Thread.spawn__anon_50660 /home/andy/src/zig/lib/std/Thread.zig:419 (zig+0x9fbde89)
    #3 Thread.Pool.init /home/andy/src/zig/lib/std/Thread/Pool.zig:57 (zig+0x9fbd9ce)
    #4 main.buildOutputType /home/andy/src/zig/src/main.zig:3205 (zig+0xa13ff8c)
    #5 main.mainArgs /home/andy/src/zig/src/main.zig:258 (zig+0x9f71cd3)
    #6 main.main /home/andy/src/zig/src/main.zig:199 (zig+0x9f6e625)
    #7 start.callMain /home/andy/src/zig/lib/std/start.zig:618 (zig+0x9f6e017)
    #8 start.callMainWithArgs /home/andy/src/zig/lib/std/start.zig:578 (zig+0x9f6e017)
    #9 start.main /home/andy/src/zig/lib/std/start.zig:593 (zig+0x9f6e017)
    #10 __libc_start_call_main ??:? (libc.so.6+0x2a10d) (BuildId: ddc3651a0e729783c4bc0a1ea2d04fa976c0b246)

andrewrk avatar Oct 23 '24 19:10 andrewrk

Commit message said "I don't think this fixes [the issue]"

andrewrk avatar Oct 23 '24 23:10 andrewrk

I think I won the data race jackpot.
test-behavior
└─ run test behavior-x86_64-linux.6.12.7...6.12.7-gnu.2.39-znver4-ReleaseFast-libc
   └─ zig test ReleaseFast native failure
error: thread 248303 panic: reached unreachable code
Unwind error at address `:0x255f4c5` (error.AddressOutOfRange), trace may be incomplete

lib/std/debug.zig:518:14: 0x17fcebc in assert (std.zig)
    if (!ok) unreachable; // assertion failure
             ^
lib/std/Progress.zig:310:15: 0x1b1d6b2 in init (std.zig)
        assert(parent_ptr.* == .unused);
              ^
lib/std/Progress.zig:194:28: 0x19af787 in start (std.zig)
                return init(free_index, parent, name, estimated_total_items);
                           ^
src/Compilation.zig:4275:44: 0x2122f09 in workerAstGenFile (main.zig)
    const child_prog_node = prog_node.start(file.sub_file_path, 0);
                                           ^
lib/std/Thread/Pool.zig:182:50: 0x2b08a4b in runFn (std.zig)
            @call(.auto, func, .{id.?} ++ closure.arguments);
                                                 ^
lib/std/Thread/Pool.zig:295:32: 0x20b393d in worker (std.zig)
            run_node.data.runFn(&run_node.data, id);
                               ^
lib/std/Thread.zig:488:13: 0x1d1e274 in callFn__anon_174524 (std.zig)
            @call(.auto, f, args);
            ^
lib/std/Thread.zig:757:30: 0x1ad4345 in entryFn (std.zig)
                return callFn(f, args_ptr.*);
                             ^
???:?:?: 0x7f297806e508 in ??? (libc.so.6)
thread 248305 panic: reached unreachable code
Unwind error at address `:0x255f4c5` (error.AddressOutOfRange), trace may be incomplete

lib/std/debug.zig:518:14: 0x17fcebc in assert (std.zig)
    if (!ok) unreachable; // assertion failure
             ^
lib/std/Progress.zig:310:15: 0x1b1d6b2 in init (std.zig)
        assert(parent_ptr.* == .unused);
              ^
lib/std/Progress.zig:194:28: 0x19af787 in start (std.zig)
                return init(free_index, parent, name, estimated_total_items);
                           ^
src/Compilation.zig:4275:44: 0x2122f09 in workerAstGenFile (main.zig)
    const child_prog_node = prog_node.start(file.sub_file_path, 0);
                                           ^
lib/std/Thread/Pool.zig:182:50: 0x2b08a4b in runFn (std.zig)
            @call(.auto, func, .{id.?} ++ closure.arguments);
                                                 ^
lib/std/Thread/Pool.zig:295:32: 0x20b393d in worker (std.zig)
            run_node.data.runFn(&run_node.data, id);
                               ^
lib/std/Thread.zig:488:13: 0x1d1e274 in callFn__anon_174524 (std.zig)
            @call(.auto, f, args);
            ^
lib/std/Thread.zig:757:30: 0x1ad4345 in entryFn (std.zig)
                return callFn(f, args_ptr.*);
                             ^
???:?:?: 0x7f297806e508 in ??? (libc.so.6)
thread 248296 panic: reached unreachable code
Unwind error at address `:0x255f4c5` (error.AddressOutOfRange), trace may be incomplete

lib/std/debug.zig:518:14: 0x17fcebc in assert (std.zig)
    if (!ok) unreachable; // assertion failure
             ^
lib/std/Progress.zig:310:15: 0x1b1d6b2 in init (std.zig)
        assert(parent_ptr.* == .unused);
              ^
lib/std/Progress.zig:194:28: 0x19af787 in start (std.zig)
                return init(free_index, parent, name, estimated_total_items);
                           ^
src/Compilation.zig:4275:44: 0x2122f09 in workerAstGenFile (main.zig)
    const child_prog_node = prog_node.start(file.sub_file_path, 0);
                                           ^
lib/std/Thread/Pool.zig:182:50: 0x2b08a4b in runFn (std.zig)
            @call(.auto, func, .{id.?} ++ closure.arguments);
                                                 ^
lib/std/Thread/Pool.zig:295:32: 0x20b393d in worker (std.zig)
            run_node.data.runFn(&run_node.data, id);
                               ^
lib/std/Thread.zig:488:13: 0x1d1e274 in callFn__anon_174524 (std.zig)
            @call(.auto, f, args);
            ^
lib/std/Thread.zig:757:30: 0x1ad4345 in entryFn (std.zig)
                return callFn(f, args_ptr.*);
                             ^
???:?:?: 0x7f297806e508 in ??? (libc.so.6)
thread 248304 panic: reached unreachable code
Unwind error at address `:0x255f4c5` (error.AddressOutOfRange), trace may be incomplete

lib/std/debug.zig:518:14: 0x17fcebc in assert (std.zig)
    if (!ok) unreachable; // assertion failure
             ^
lib/std/Progress.zig:310:15: 0x1b1d6b2 in init (std.zig)
        assert(parent_ptr.* == .unused);
              ^
lib/std/Progress.zig:194:28: 0x19af787 in start (std.zig)
                return init(free_index, parent, name, estimated_total_items);
                           ^
src/Compilation.zig:4275:44: 0x2122f09 in workerAstGenFile (main.zig)
    const child_prog_node = prog_node.start(file.sub_file_path, 0);
                                           ^
lib/std/Thread/Pool.zig:182:50: 0x2b08a4b in runFn (std.zig)
            @call(.auto, func, .{id.?} ++ closure.arguments);
                                                 ^
lib/std/Thread/Pool.zig:295:32: 0x20b393d in worker (std.zig)
            run_node.data.runFn(&run_node.data, id);
                               ^
lib/std/Thread.zig:488:13: 0x1d1e274 in callFn__anon_174524 (std.zig)
            @call(.auto, f, args);
            ^
lib/std/Thread.zig:757:30: 0x1ad4345 in entryFn (std.zig)
                return callFn(f, args_ptr.*);
                             ^
???:?:?: 0x7f297806e508 in ??? (libc.so.6)
thread 248298 panic: reached unreachable code
Unwind error at address `:0x255f4c5` (error.AddressOutOfRange), trace may be incomplete

lib/std/debug.zig:518:14: 0x17fcebc in assert (std.zig)
    if (!ok) unreachable; // assertion failure
             ^
lib/std/Progress.zig:310:15: 0x1b1d6b2 in init (std.zig)
        assert(parent_ptr.* == .unused);
              ^
lib/std/Progress.zig:194:28: 0x19af787 in start (std.zig)
                return init(free_index, parent, name, estimated_total_items);
                           ^
src/Compilation.zig:4275:44: 0x2122f09 in workerAstGenFile (main.zig)
    const child_prog_node = prog_node.start(file.sub_file_path, 0);
                                           ^
lib/std/Thread/Pool.zig:182:50: 0x2b08a4b in runFn (std.zig)
            @call(.auto, func, .{id.?} ++ closure.arguments);
                                                 ^
lib/std/Thread/Pool.zig:295:32: 0x20b393d in worker (std.zig)
            run_node.data.runFn(&run_node.data, id);
                               ^
lib/std/Thread.zig:488:13: 0x1d1e274 in callFn__anon_174524 (std.zig)
            @call(.auto, f, args);
            ^
lib/std/Thread.zig:757:30: 0x1ad4345 in entryFn (std.zig)
                return callFn(f, args_ptr.*);
                             ^
???:?:?: 0x7f297806e508 in ??? (libc.so.6)
thread 248291 panic: reached unreachable code
Unwind error at address `:0x255f4c5` (error.AddressOutOfRange), trace may be incomplete

lib/std/debug.zig:518:14: 0x17fcebc in assert (std.zig)
    if (!ok) unreachable; // assertion failure
             ^
lib/std/Progress.zig:310:15: 0x1b1d6b2 in init (std.zig)
        assert(parent_ptr.* == .unused);
              ^
lib/std/Progress.zig:194:28: 0x19af787 in start (std.zig)
                return init(free_index, parent, name, estimated_total_items);
                           ^
src/Compilation.zig:4275:44: 0x2122f09 in workerAstGenFile (main.zig)
    const child_prog_node = prog_node.start(file.sub_file_path, 0);
                                           ^
lib/std/Thread/Pool.zig:182:50: 0x2b08a4b in runFn (std.zig)
            @call(.auto, func, .{id.?} ++ closure.arguments);
                                                 ^
lib/std/Thread/Pool.zig:295:32: 0x20b393d in worker (std.zig)
            run_node.data.runFn(&run_node.data, id);
                               ^
lib/std/Thread.zig:488:13: 0x1d1e274 in callFn__anon_174524 (std.zig)
            @call(.auto, f, args);
            ^
lib/std/Thread.zig:757:30: 0x1ad4345 in entryFn (std.zig)
                return callFn(f, args_ptr.*);
                             ^
???:?:?: 0x7f297806e508 in ??? (libc.so.6)
thread 248302 panic: reached unreachable code
Unwind error at address `:0x255f4c5` (error.AddressOutOfRange), trace may be incomplete

lib/std/debug.zig:518:14: 0x17fcebc in assert (std.zig)
    if (!ok) unreachable; // assertion failure
             ^
lib/std/Progress.zig:310:15: 0x1b1d6b2 in init (std.zig)
        assert(parent_ptr.* == .unused);
              ^
lib/std/Progress.zig:194:28: 0x19af787 in start (std.zig)
                return init(free_index, parent, name, estimated_total_items);
                           ^
src/Compilation.zig:4275:44: 0x2122f09 in workerAstGenFile (main.zig)
    const child_prog_node = prog_node.start(file.sub_file_path, 0);
                                           ^
lib/std/Thread/Pool.zig:182:50: 0x2b08a4b in runFn (std.zig)
            @call(.auto, func, .{id.?} ++ closure.arguments);
                                                 ^
lib/std/Thread/Pool.zig:295:32: 0x20b393d in worker (std.zig)
            run_node.data.runFn(&run_node.data, id);
                               ^
lib/std/Thread.zig:488:13: 0x1d1e274 in callFn__anon_174524 (std.zig)
            @call(.auto, f, args);
            ^
lib/std/Thread.zig:757:30: 0x1ad4345 in entryFn (std.zig)
                return callFn(f, args_ptr.*);
                             ^
???:?:?: 0x7f297806e508 in ??? (libc.so.6)
thread 248300 panic: reached unreachable code
Unwind error at address `:0x255f4c5` (error.AddressOutOfRange), trace may be incomplete

lib/std/debug.zig:518:14: 0x17fcebc in assert (std.zig)
    if (!ok) unreachable; // assertion failure
             ^
lib/std/Progress.zig:310:15: 0x1b1d6b2 in init (std.zig)
        assert(parent_ptr.* == .unused);
              ^
lib/std/Progress.zig:194:28: 0x19af787 in start (std.zig)
                return init(free_index, parent, name, estimated_total_items);
                           ^
src/Compilation.zig:4275:44: 0x2122f09 in workerAstGenFile (main.zig)
    const child_prog_node = prog_node.start(file.sub_file_path, 0);
                                           ^
lib/std/Thread/Pool.zig:182:50: 0x2b08a4b in runFn (std.zig)
            @call(.auto, func, .{id.?} ++ closure.arguments);
                                                 ^
lib/std/Thread/Pool.zig:295:32: 0x20b393d in worker (std.zig)
            run_node.data.runFn(&run_node.data, id);
                               ^
lib/std/Thread.zig:488:13: 0x1d1e274 in callFn__anon_174524 (std.zig)
            @call(.auto, f, args);
            ^
lib/std/Thread.zig:757:30: 0x1ad4345 in entryFn (std.zig)
                return callFn(f, args_ptr.*);
                             ^
???:?:?: 0x7f297806e508 in ??? (libc.so.6)
thread 248308 panic: reached unreachable code
Unwind error at address `:0x255f4c5` (error.AddressOutOfRange), trace may be incomplete

lib/std/debug.zig:518:14: 0x17fcebc in assert (std.zig)
    if (!ok) unreachable; // assertion failure
             ^
lib/std/Progress.zig:310:15: 0x1b1d6b2 in init (std.zig)
        assert(parent_ptr.* == .unused);
              ^
lib/std/Progress.zig:194:28: 0x19af787 in start (std.zig)
                return init(free_index, parent, name, estimated_total_items);
                           ^
src/Compilation.zig:4275:44: 0x2122f09 in workerAstGenFile (main.zig)
    const child_prog_node = prog_node.start(file.sub_file_path, 0);
                                           ^
lib/std/Thread/Pool.zig:182:50: 0x2b08a4b in runFn (std.zig)
            @call(.auto, func, .{id.?} ++ closure.arguments);
                                                 ^
lib/std/Thread/Pool.zig:295:32: 0x20b393d in worker (std.zig)
            run_node.data.runFn(&run_node.data, id);
                               ^
lib/std/Thread.zig:488:13: 0x1d1e274 in callFn__anon_174524 (std.zig)
            @call(.auto, f, args);
            ^
lib/std/Thread.zig:757:30: 0x1ad4345 in entryFn (std.zig)
                return callFn(f, args_ptr.*);
                             ^
???:?:?: 0x7f297806e508 in ??? (libc.so.6)
thread 248297 panic: reached unreachable code
Unwind error at address `:0x255f4c5` (error.AddressOutOfRange), trace may be incomplete

lib/std/debug.zig:518:14: 0x17fcebc in assert (std.zig)
    if (!ok) unreachable; // assertion failure
             ^
lib/std/Progress.zig:310:15: 0x1b1d6b2 in init (std.zig)
        assert(parent_ptr.* == .unused);
              ^
lib/std/Progress.zig:194:28: 0x19af787 in start (std.zig)
                return init(free_index, parent, name, estimated_total_items);
                           ^
src/Compilation.zig:4275:44: 0x2122f09 in workerAstGenFile (main.zig)
    const child_prog_node = prog_node.start(file.sub_file_path, 0);
                                           ^
lib/std/Thread/Pool.zig:182:50: 0x2b08a4b in runFn (std.zig)
            @call(.auto, func, .{id.?} ++ closure.arguments);
                                                 ^
lib/std/Thread/Pool.zig:295:32: 0x20b393d in worker (std.zig)
            run_node.data.runFn(&run_node.data, id);
                               ^
lib/std/Thread.zig:488:13: 0x1d1e274 in callFn__anon_174524 (std.zig)
            @call(.auto, f, args);
            ^
lib/std/Thread.zig:757:30: 0x1ad4345 in entryFn (std.zig)
                return callFn(f, args_ptr.*);
                             ^
???:?:?: 0x7f297806e508 in ??? (libc.so.6)
thread 248312 panic: reached unreachable code
Unwind error at address `:0x255f4c5` (error.AddressOutOfRange), trace may be incomplete

lib/std/debug.zig:518:14: 0x17fcebc in assert (std.zig)
    if (!ok) unreachable; // assertion failure
             ^
lib/std/Progress.zig:310:15: 0x1b1d6b2 in init (std.zig)
        assert(parent_ptr.* == .unused);
              ^
lib/std/Progress.zig:194:28: 0x19af787 in start (std.zig)
                return init(free_index, parent, name, estimated_total_items);
                           ^
src/Compilation.zig:4275:44: 0x2122f09 in workerAstGenFile (main.zig)
    const child_prog_node = prog_node.start(file.sub_file_path, 0);
                                           ^
lib/std/Thread/Pool.zig:182:50: 0x2b08a4b in runFn (std.zig)
            @call(.auto, func, .{id.?} ++ closure.arguments);
                                                 ^
lib/std/Thread/Pool.zig:295:32: 0x20b393d in worker (std.zig)
            run_node.data.runFn(&run_node.data, id);
                               ^
lib/std/Thread.zig:488:13: 0x1d1e274 in callFn__anon_174524 (std.zig)
            @call(.auto, f, args);
            ^
lib/std/Thread.zig:757:30: 0x1ad4345 in entryFn (std.zig)
                return callFn(f, args_ptr.*);
                             ^
???:?:?: 0x7f297806e508 in ??? (libc.so.6)
thread 248287 panic: reached unreachable code
Unwind error at address `:0x255f4c5` (error.AddressOutOfRange), trace may be incomplete

lib/std/debug.zig:518:14: 0x17fcebc in assert (std.zig)
    if (!ok) unreachable; // assertion failure
             ^
lib/std/Progress.zig:310:15: 0x1b1d6b2 in init (std.zig)
        assert(parent_ptr.* == .unused);
              ^
lib/std/Progress.zig:194:28: 0x19af787 in start (std.zig)
                return init(free_index, parent, name, estimated_total_items);
                           ^
src/Compilation.zig:4275:44: 0x2122f09 in workerAstGenFile (main.zig)
    const child_prog_node = prog_node.start(file.sub_file_path, 0);
                                           ^
lib/std/Thread/Pool.zig:182:50: 0x2b08a4b in runFn (std.zig)
            @call(.auto, func, .{id.?} ++ closure.arguments);
                                                 ^
lib/std/Thread/Pool.zig:295:32: 0x20b393d in worker (std.zig)
            run_node.data.runFn(&run_node.data, id);
                               ^
lib/std/Thread.zig:488:13: 0x1d1e274 in callFn__anon_174524 (std.zig)
            @call(.auto, f, args);
            ^
lib/std/Thread.zig:757:30: 0x1ad4345 in entryFn (std.zig)
                return callFn(f, args_ptr.*);
                             ^
???:?:?: 0x7f297806e508 in ??? (libc.so.6)
thread 248310 panic: reached unreachable code
Unwind error at address `:0x255f4c5` (error.AddressOutOfRange), trace may be incomplete

lib/std/debug.zig:518:14: 0x17fcebc in assert (std.zig)
    if (!ok) unreachable; // assertion failure
             ^
lib/std/Progress.zig:310:15: 0x1b1d6b2 in init (std.zig)
        assert(parent_ptr.* == .unused);
              ^
lib/std/Progress.zig:194:28: 0x19af787 in start (std.zig)
                return init(free_index, parent, name, estimated_total_items);
                           ^
src/Compilation.zig:4275:44: 0x2122f09 in workerAstGenFile (main.zig)
    const child_prog_node = prog_node.start(file.sub_file_path, 0);
                                           ^
lib/std/Thread/Pool.zig:182:50: 0x2b08a4b in runFn (std.zig)
            @call(.auto, func, .{id.?} ++ closure.arguments);
                                                 ^
lib/std/Thread/Pool.zig:295:32: 0x20b393d in worker (std.zig)
            run_node.data.runFn(&run_node.data, id);
                               ^
lib/std/Thread.zig:488:13: 0x1d1e274 in callFn__anon_174524 (std.zig)
            @call(.auto, f, args);
            ^
lib/std/Thread.zig:757:30: 0x1ad4345 in entryFn (std.zig)
                return callFn(f, args_ptr.*);
                             ^
???:?:?: 0x7f297806e508 in ??? (libc.so.6)
thread 248283 panic: reached unreachable code
Unwind error at address `:0x255f4c5` (error.AddressOutOfRange), trace may be incomplete

lib/std/debug.zig:518:14: 0x17fcebc in assert (std.zig)
    if (!ok) unreachable; // assertion failure
             ^
lib/std/Progress.zig:310:15: 0x1b1d6b2 in init (std.zig)
        assert(parent_ptr.* == .unused);
              ^
lib/std/Progress.zig:194:28: 0x19af787 in start (std.zig)
                return init(free_index, parent, name, estimated_total_items);
                           ^
src/Compilation.zig:4275:44: 0x2122f09 in workerAstGenFile (main.zig)
    const child_prog_node = prog_node.start(file.sub_file_path, 0);
                                           ^
lib/std/Thread/Pool.zig:182:50: 0x2b08a4b in runFn (std.zig)
            @call(.auto, func, .{id.?} ++ closure.arguments);
                                                 ^
lib/std/Thread/Pool.zig:295:32: 0x20b393d in worker (std.zig)
            run_node.data.runFn(&run_node.data, id);
                               ^
lib/std/Thread.zig:488:13: 0x1d1e274 in callFn__anon_174524 (std.zig)
            @call(.auto, f, args);
            ^
lib/std/Thread.zig:757:30: 0x1ad4345 in entryFn (std.zig)
                return callFn(f, args_ptr.*);
                             ^
???:?:?: 0x7f297806e508 in ??? (libc.so.6)
thread 248307 panic: reached unreachable code
Unwind error at address `:0x255f4c5` (error.AddressOutOfRange), trace may be incomplete

lib/std/debug.zig:518:14: 0x17fcebc in assert (std.zig)
    if (!ok) unreachable; // assertion failure
             ^
lib/std/Progress.zig:310:15: 0x1b1d6b2 in init (std.zig)
        assert(parent_ptr.* == .unused);
              ^
lib/std/Progress.zig:194:28: 0x19af787 in start (std.zig)
                return init(free_index, parent, name, estimated_total_items);
                           ^
src/Compilation.zig:4275:44: 0x2122f09 in workerAstGenFile (main.zig)
    const child_prog_node = prog_node.start(file.sub_file_path, 0);
                                           ^
lib/std/Thread/Pool.zig:182:50: 0x2b08a4b in runFn (std.zig)
            @call(.auto, func, .{id.?} ++ closure.arguments);
                                                 ^
lib/std/Thread/Pool.zig:295:32: 0x20b393d in worker (std.zig)
            run_node.data.runFn(&run_node.data, id);
                               ^
lib/std/Thread.zig:488:13: 0x1d1e274 in callFn__anon_174524 (std.zig)
            @call(.auto, f, args);
            ^
lib/std/Thread.zig:757:30: 0x1ad4345 in entryFn (std.zig)
                return callFn(f, args_ptr.*);
                             ^
???:?:?: 0x7f297806e508 in ??? (libc.so.6)
thread 248306 panic: reached unreachable code
Unwind error at address `:0x255f4c5` (error.AddressOutOfRange), trace may be incomplete

lib/std/debug.zig:518:14: 0x17fcebc in assert (std.zig)
    if (!ok) unreachable; // assertion failure
             ^
lib/std/Progress.zig:310:15: 0x1b1d6b2 in init (std.zig)
        assert(parent_ptr.* == .unused);
              ^
lib/std/Progress.zig:194:28: 0x19af787 in start (std.zig)
                return init(free_index, parent, name, estimated_total_items);
                           ^
src/Compilation.zig:4275:44: 0x2122f09 in workerAstGenFile (main.zig)
    const child_prog_node = prog_node.start(file.sub_file_path, 0);
                                           ^
lib/std/Thread/Pool.zig:182:50: 0x2b08a4b in runFn (std.zig)
            @call(.auto, func, .{id.?} ++ closure.arguments);
                                                 ^
lib/std/Thread/Pool.zig:295:32: 0x20b393d in worker (std.zig)
            run_node.data.runFn(&run_node.data, id);
                               ^
lib/std/Thread.zig:488:13: 0x1d1e274 in callFn__anon_174524 (std.zig)
            @call(.auto, f, args);
            ^
lib/std/Thread.zig:757:30: 0x1ad4345 in entryFn (std.zig)
                return callFn(f, args_ptr.*);
                             ^
???:?:?: 0x7f297806e508 in ??? (libc.so.6)

error: the following command terminated unexpectedly:
self/bin/zig test -OReleaseFast -Mroot=test/behavior.zig -lc --cache-dir .zig-cache --global-cache-dir ~/.cache/zig --name test --zig-lib-dir lib --listen=-

jacobly0 avatar Jan 26 '25 14:01 jacobly0

Got this stack trace from our CI at work:

WARNING: ThreadSanitizer: data race (pid=1385)
  Write of size 4 at 0x55e0a0949958 by main thread:
    #0 Progress.Node.init /home/circleci/project/workspace/zig/lib/std/Progress.zig:288:56 (test+0x305ab1d)
    #1 Progress.Node.start /home/circleci/project/workspace/zig/lib/std/Progress.zig:191:28 (test+0x2dc1fdc)
    #2 test_runner.mainTerminal /home/circleci/project/workspace/zig/lib/compiler/test_runner.zig:153:42 (test+0x2ba45b1)
    #3 test_runner.main /home/circleci/project/workspace/zig/lib/compiler/test_runner.zig:37:28 (test+0x27e7cd4)
    #4 start.callMain /home/circleci/project/workspace/zig/lib/std/start.zig:514:22 (test+0x27e76fb)
    #5 start.callMainWithArgs /home/circleci/project/workspace/zig/lib/std/start.zig:482:20 (test+0x27e76fb)
    #6 main /home/circleci/project/workspace/zig/lib/std/start.zig:497:75 (test+0x27e76fb)

  Previous atomic read of size 4 at 0x55e0a0949958 by thread T1:
    #0 Progress.serialize /home/circleci/project/workspace/zig/lib/std/Progress.zig:763:79 (test+0x3707b18)
    #1 Progress.computeRedraw /home/circleci/project/workspace/zig/lib/std/Progress.zig:1039:33 (test+0x3708a7f)
    #2 Progress.updateThreadRun /home/circleci/project/workspace/zig/lib/std/Progress.zig:463:40 (test+0x357e22f)
    #3 Thread.callFn__anon_70269 /home/circleci/project/workspace/zig/lib/std/Thread.zig:408:13 (test+0x33b78f0)
    #4 Thread.PosixThreadImpl.spawn__anon_67929.Instance.entryFn /home/circleci/project/workspace/zig/lib/std/Thread.zig:674:30 (test+0x3230ab5)

  Location is global 'Progress.node_storage_buffer' of size 3984 at 0x55e0a0949924 (test+0x5a5d958)

  Thread T1 (tid=1705, running) created by main thread at:
    #0 pthread_create /home/circleci/project/workspace/zig/lib/tsan/tsan_interceptors_posix.cpp:1020:3 (test+0x586c62f)
    #1 Thread.PosixThreadImpl.spawn__anon_67929 /home/circleci/project/workspace/zig/lib/std/Thread.zig:692:33 (test+0x32307b3)
    #2 Thread.spawn__anon_64641 /home/circleci/project/workspace/zig/lib/std/Thread.zig:341:32 (test+0x305b171)
    #3 Progress.start /home/circleci/project/workspace/zig/lib/std/Progress.zig:405:55 (test+0x2dc1c33)
    #4 test_runner.mainTerminal /home/circleci/project/workspace/zig/lib/compiler/test_runner.zig:132:41 (test+0x2ba4389)
    #5 test_runner.main /home/circleci/project/workspace/zig/lib/compiler/test_runner.zig:37:28 (test+0x27e7cd4)
    #6 start.callMain /home/circleci/project/workspace/zig/lib/std/start.zig:514:22 (test+0x27e76fb)
    #7 start.callMainWithArgs /home/circleci/project/workspace/zig/lib/std/start.zig:482:20 (test+0x27e76fb)
    #8 main /home/circleci/project/workspace/zig/lib/std/start.zig:497:75 (test+0x27e76fb)

SUMMARY: ThreadSanitizer: data race /home/circleci/project/workspace/zig/lib/std/Progress.zig:288:56 in Progress.Node.init

It's triggering 100% consistently on the CircleCI machines.

Rexicon226 avatar Feb 04 '25 13:02 Rexicon226

It's trivial to reproduce the issue by rapidly starting and ending nodes in parallel.

This example code crashes immediately on my desktop.
const std = @import("std");
const Progress = std.Progress;
const Thread = std.Thread;

// Three threads seems to be the minimum necessary to invoke the bug.
const num_threads = 3;
const steps_per_node = 3;

pub fn main() !void {
    var root_progress = Progress.start(.{ .root_name = "root" });
    var threads: [num_threads]Thread = undefined;
    for (0..num_threads) |i| {
        threads[i] = try Thread.spawn(.{}, spam, .{ &root_progress, i });
    }
    threads[0].join();
}

fn spam(root_progress: *Progress.Node, threadnum: usize) void {
    var name: [2]u8 = undefined;
    const name_len = std.fmt.formatIntBuf(name[0..name.len], threadnum, 10, .lower, .{ .width = 2 });
    while (true) {
        const node = root_progress.start(name[0..name_len], steps_per_node);
        for (0..steps_per_node) |_| {
            node.completeOne();
        }
        node.end();
    }
}

Re the assert(parent_ptr.* == .unused) failure: ThreadSanitizer no longer complains once the read is behind an @atomicLoad, but the assertion still fails. I believe the free list isn't being maintained correctly, but I haven't confirmed that or figured out why yet. I'll keep digging (need to sort out a better debugger).

achan1989 avatar Mar 12 '25 20:03 achan1989

I found and fixed a bug in the maintenance of node_end_index which can cause serialize() to crash, but I doubt it's reachable under normal conditions. It doesn't seem to be related to the main issue.

As a first test, the issue isn't reproducible with my test code if I surround the freelist manipulation code in Node.start() and Node.end() with a recursive mutex. Even if I bump the num_threads up to 50.

But I can't work out where a logic error or race condition exists just by staring at the code, so I've removed the mutex and have been trying to use the rr debugger for its time-travelling magic. Unfortunately using rr forces execution to be locked to one core, which prevents my example code from reproducing the issue without some questionable workarounds. To cut a long story short, I've seen evidence that two threads can call Node.start() concurrently and end up with the same free_index. Though I'm not 100% certain, since rr seems to be unreliable in this configuration.

So I still can't work out why. I'd like to log the decisions leading to the choice of every free_index -- I'll experiment tomorrow.

achan1989 avatar Mar 15 '25 21:03 achan1989

I've banged my head on this too much, I have to drop it -- I clearly don't understand memory ordering etc well enough.

If anyone is interested in the lesser bugs I did manage to find and fix, see https://github.com/achan1989/zig/tree/progress_clean_slate

achan1989 avatar Mar 21 '25 19:03 achan1989

It just occurred to me: I think we have an ABA problem in the freelist. I'm going to try converting it to a generational index.

achan1989 avatar Apr 26 '25 07:04 achan1989