zig icon indicating copy to clipboard operation
zig copied to clipboard

Linux futex (v1 and v2) API fixes, tests and Ziggification

Open rootbeer opened this issue 7 months ago • 1 comments

linux: futex v1 API cleanup

  • Use Ziggish packed struct for flags arguments. Old: linux.FUTEX.WAIT vs new: .{ .cmd = .WAIT, .private = false }.

  • rename futex_wait and futex_wake which didn't actually specify wait/wake, as futex_3arg and futex_4arg (as its the number of parameters that is different, the actual op is the second parameter).

  • provide the full six-arg flavor of the syscall (for some of the advanced ops), and add packed structs for the flag-ish parameters.

  • Use a packed union to support the 4th parameter which is sometimes a timespec pointer, and sometimes a 32-bit value.

  • Add tests that make sure the structure layout is correct and that the basic argument passing is working (no actual futexes are contended).

linux: futex v2 API cleanup

  • futex2_waitv always takes a 64-bit timespec. Perhaps the kernel_timespec should be renamed timespec64 now? Its used in iouring, too.

  • Add Ziggish packed struct encoding for futex v2 flag parameters.

  • Add very basic "tests" for the futex v2 syscalls (these found the 64-bit timespec bug).

  • Update the stale or broken comments. (I could also just delete these they're not really documenting Zig-specific behavior.)

Given that the futex2 APIs are not used by Zig's library (they're a bit too new), and the fact that these are very specialized syscalls, and they currently provide no strong benefits over the existing v1 API, it might be prudent to just delete them entirely. If you're fancy enough to build stuff on the futex API, you're more than capable of writing your own syscall wrappers ...

rootbeer avatar Apr 04 '25 05:04 rootbeer

Hmm, not clear what's going on with aarch64-linux-release here. Maybe OOM killer victim?

alexrp avatar Apr 05 '25 18:04 alexrp

This change seems to be causing tests to fail in zig v0.15.1 for me?

test
+- test-modules
   +- test-std
      +- run test std-native-znver3-ReleaseSmall-libc 2915/2944 passed, 3 failed, 26 skipped
error: 'os.linux.test.test.futex2_wait' failed: expected .AGAIN, found .PERM
error: 'os.linux.test.test.futex2_wake' failed: expected 0, found 18446744073709551615
error: 'os.linux.test.test.futex2_requeue' failed: expected 0, found 18446744073709551615
error: while executing test 'zig.system.darwin.macos.test.detect', the following test command failed:
./.zig-cache/o/78908d7a3430ac8c56ce5b8b8b9ab4d6/test --cache-dir=./.zig-cache --seed=0x981d8403 --listen=-
test
+- test-modules
   +- test-std
      +- run test std-native-znver3-ReleaseSmall-single 2885/2942 passed, 3 failed, 54 skipped
error: 'os.linux.test.test.futex2_wait' failed: expected .AGAIN, found .PERM
error: 'os.linux.test.test.futex2_wake' failed: expected 0, found 18446744073709551615
error: 'os.linux.test.test.futex2_requeue' failed: expected 0, found 18446744073709551615
error: while executing test 'zig.system.darwin.macos.test.detect', the following test command failed:
./.zig-cache/o/22f9d114fc81d6681d809ceacf4d4868/test --cache-dir=./.zig-cache --seed=0x981d8403 --listen=-
test
+- test-modules
   +- test-std
      +- run test std-native-znver3-ReleaseSmall 2916/2944 passed, 3 failed, 25 skipped
error: 'os.linux.test.test.futex2_wait' failed: expected .AGAIN, found .PERM
error: 'os.linux.test.test.futex2_wake' failed: expected 0, found 18446744073709551615
error: 'os.linux.test.test.futex2_requeue' failed: expected 0, found 18446744073709551615
error: while executing test 'zig.system.darwin.macos.test.detect', the following test command failed:
./.zig-cache/o/8fa39f020ebf81a5b51a8cfb5ce00fcf/test --cache-dir=./.zig-cache --seed=0x981d8403 --listen=-
test
+- test-modules
   +- test-std
      +- run test std-native-znver3-Debug-libc 2916/2944 passed, 3 failed, 25 skipped
error: 'os.linux.test.test.futex2_wait' failed: expected .AGAIN, found .PERM
/build/zig/src/zig-0.15.1/lib/std/testing.zig:110:17: 0x396ed1a in expectEqualInner__anon_1287536 (std.zig)
                return error.TestExpectedEqual;
                ^
/build/zig/src/zig-0.15.1/lib/std/os/linux/test.zig:371:5: 0x397078e in test.futex2_wait (std.zig)
    try expectEqual(.AGAIN, linux.E.init(rc));
    ^
error: 'os.linux.test.test.futex2_wake' failed: expected 0, found 18446744073709551615
/build/zig/src/zig-0.15.1/lib/std/testing.zig:110:17: 0x1378c99 in expectEqualInner__anon_42895 (std.zig)
                return error.TestExpectedEqual;
                ^
/build/zig/src/zig-0.15.1/lib/std/os/linux/test.zig:402:5: 0x39725d0 in test.futex2_wake (std.zig)
    try expectEqual(0, rc);
    ^
error: 'os.linux.test.test.futex2_requeue' failed: expected 0, found 18446744073709551615
/build/zig/src/zig-0.15.1/lib/std/testing.zig:110:17: 0x1378c99 in expectEqualInner__anon_42895 (std.zig)
                return error.TestExpectedEqual;
                ^
/build/zig/src/zig-0.15.1/lib/std/os/linux/test.zig:427:5: 0x397292e in test.futex2_requeue (std.zig)
    try expectEqual(0, rc);
    ^
error: while executing test 'zig.system.darwin.macos.test.detect', the following test command failed:
./.zig-cache/o/3aed4d36a684aa3a18c4bf51abf424cc/test --cache-dir=./.zig-cache --seed=0x981d8403 --listen=-
test

daurnimator avatar Sep 12 '25 02:09 daurnimator

uname -r?

alexrp avatar Sep 12 '25 02:09 alexrp

uname -r?

6.16.0-arch2-1

daurnimator avatar Sep 12 '25 03:09 daurnimator

Huh, the EPERM failure (where EAGAIN is expected) is suspicious. The documentation says that EPERM should only happen with PI (priority inheriting) futexes, and the test is not intentionally testing those. Are you running in an environment or something that might be restricting syscalls somehow?

On the other hand, the futex2 syscalls aren't supported on many systems, so they haven't been getting all that much testing coverage.

I believe the specific test line that is failing is:

rc = linux.futex2_wait(&lock.raw, 2, mask, flags, null, .MONOTONIC);

Given that the lock is initialized to 1, this futex operation should return immediately (as its expecting the lock to be 2), so even if there are any flags set on the lock, I'd be a bit surprised if any of them are checked.... So my suspicion is the EPERM is coming from some other layer.

I'm running kernel v6.16.4, so I don't think its recent kernel futex2 change of any sort.

Probably unrelated but why is the Zig test failure message calling out "zig.system.darwin.macos.test.detect`"?

Oh, and I should learn to recognize "18446744073709551615". That is "0xffffffffffffffff". Which is -1, which is errno 1, which is ... EPERM.

I'm curious to know what's going on here, but one option is to just remove all the futex2 support and tests from Zig. As I noted in the PR anyone sophisticated enough to build software on the futex2 API, is more than capable of invoking a couple syscalls directly ...

rootbeer avatar Sep 12 '25 04:09 rootbeer

I'm curious to know what's going on here, but one option is to just remove all the futex2 support and tests from Zig.

Worst case we can disable the tests, or make them skip under whatever conditions are going on here. Removing the support entirely seems excessive.

But in any case, we need to understand why this is happening; this is the first I've ever heard of it. My suspicion would be an overzealous seccomp filter or something along those lines.

alexrp avatar Sep 12 '25 04:09 alexrp