zig Various posix.[sg]etsockopt improvements

Some of these may require some debate or correction, so I split this up into several distinct commits and will mark as draft initially.

Aug 30 '25 14:08 blblack

CC @alexrp and @rootbeer - Thoughts on whether this is a reasonable approach? My aim here is Zig software that wants to use these kinds of system interfaces shouldn't have to directly rely on std.c and be forced to link libc on Linux or other targets that don't otherwise require libc.

There are other alternative pathways to consider here, perhaps.

We could make some minor breaking changes to the current set and get interfaces to unlock at least some of these edge cases. For example, the get call could return a slice of the optval buffer instead of void and assert a returned length less than or equal to the input optval len, for the case that the returned length is less than the buffer input length and the caller needs to know it. And the set call could accept an optional slice as input to handle a case like SO_ACCEPTFILTER where one needs to be able to set null.

Even if we did both of those things, Linux SO_PEERSEC would still be unusable. It needs the caller to be able to both dynamically adapt the buffer size upwards and receive the final data length which may be smaller than the provided buffer. To abstract that, at bare minimum you need a getsockopt that takes an allocator argument so that it could do the resizing loop internally, which is kinda ewww. I assume SO_PEERSEC is probably not the lone outlier like this in the universe of platforms, socket types, and levels (esp if we want the interface to be stable into the future, where more options may appear).

We could also try to do something way-fancier with comptime and/or runtime switching on the optname itself, for varying the internal behavior per-option for all the known edge cases, and return some complex struct or tuple that can represent all the odd cases, and even specialize the error return outcomes to the optname as well? I suspect if we tried to go down this path, this would (a) get very unwieldy over time with a ton of nested switches and a ton of @hasDecl() or target-condition lists for the SO_FOO that exist on various subsets of platforms, and (b) end up being an un-ergonomic interface for the majority cases where the option is just a simple boolean or integer setting.

Aug 31 '25 13:08 blblack

To codify one of the alternative paths - if we're willing to change/break the existing interfaces a bit, and lose a little bit of ergonomics in the get-case, it could look something like this:

diff --git lib/std/posix.zig lib/std/posix.zig
index 1bce41c2e9..127d2caaaa 100644
--- lib/std/posix.zig
+++ lib/std/posix.zig
@@ -4353,22 +4353,33 @@ pub const GetSockOptError = error{
 
     /// Insufficient resources are available in the system to complete the call.
     SystemResources,
+
+    /// The supplied opt buffer was too small for the data to be returned.
+    BufferTooSmall,
+
+    /// This can mean different things depending on the platform, the option
+    /// name, and conditions.  One known oddball example is SO_ACCEPTFILTER on
+    /// BSDs, which returns this error to indicate that no filter is currently
+    /// installed.
+    InvalidOption,
 } || UnexpectedError;
 
-pub fn getsockopt(fd: socket_t, level: i32, optname: u32, opt: []u8) GetSockOptError!void {
+pub fn getsockopt(fd: socket_t, level: i32, optname: u32, opt: []u8) GetSockOptError![]const u8 {
     var len: socklen_t = @intCast(opt.len);
     switch (errno(system.getsockopt(fd, level, optname, opt.ptr, &len))) {
         .SUCCESS => {
-            std.debug.assert(len == opt.len);
+            std.debug.assert(len <= opt.len);
+            return opt[0..len];
         },
         .BADF => unreachable,
         .NOTSOCK => unreachable,
-        .INVAL => unreachable,
         .FAULT => unreachable,
+        .INVAL => return error.InvalidOption,
         .NOPROTOOPT => return error.InvalidProtocolOption,
         .NOMEM => return error.SystemResources,
         .NOBUFS => return error.SystemResources,
         .ACCES => return error.AccessDenied,
+        .RANGE => return error.BufferTooSmall,
         else => |err| return unexpectedErrno(err),
     }
 }
@@ -6721,6 +6733,14 @@ pub const SetSockOptError = error{
     /// Setting the socket option requires more elevated permissions.
     PermissionDenied,
 
+    /// This can mean many different things depending on the platform, the
+    /// option name, and conditions.  For example:
+    /// - That the socket was not listen()ing when setting SO_ACCEPTFILTER on BSDs.
+    /// - That an non-multicast IP address was set for IP_ADD_MEMBERSHIP on Linux.
+    /// - That the socket is shut down and the option requires otherwise
+    /// - etc...
+    InvalidOption,
+
     OperationNotSupported,
     NetworkSubsystemFailed,
     FileDescriptorNotASocket,
@@ -6729,9 +6749,11 @@ pub const SetSockOptError = error{
 } || UnexpectedError;
 
 /// Set a socket's options.
-pub fn setsockopt(fd: socket_t, level: i32, optname: u32, opt: []const u8) SetSockOptError!void {
+pub fn setsockopt(fd: socket_t, level: i32, optname: u32, opt: ?[]const u8) SetSockOptError!void {
+    const opt_ptr: ?[*]const u8 = if (opt) |o| o.ptr else null;
+    const opt_len: usize = if (opt) |o| o.len else 0;
     if (native_os == .windows) {
-        const rc = windows.ws2_32.setsockopt(fd, level, @intCast(optname), opt.ptr, @intCast(opt.len));
+        const rc = windows.ws2_32.setsockopt(fd, level, @intCast(optname), @ptrCast(opt_ptr), @intCast(opt_len));
         if (rc == windows.ws2_32.SOCKET_ERROR) {
             switch (windows.ws2_32.WSAGetLastError()) {
                 .WSANOTINITIALISED => unreachable,
@@ -6744,12 +6766,12 @@ pub fn setsockopt(fd: socket_t, level: i32, optname: u32, opt: []const u8) SetSo
         }
         return;
     } else {
-        switch (errno(system.setsockopt(fd, level, optname, opt.ptr, @intCast(opt.len)))) {
+        switch (errno(system.setsockopt(fd, level, optname, @ptrCast(opt_ptr), @intCast(opt_len)))) {
             .SUCCESS => {},
             .BADF => unreachable, // always a race condition
             .NOTSOCK => unreachable, // always a race condition
-            .INVAL => unreachable,
             .FAULT => unreachable,
+            .INVAL => return error.InvalidOption,
             .DOM => return error.TimeoutTooBig,
             .ISCONN => return error.AlreadyConnected,
             .NOPROTOOPT => return error.InvalidProtocolOption,

The tradeoffs with this approach:

Slight negatives:
- posix.getsockopt() callers would need to update their simple-fixed-case calls to throw away the result, e.g. _ = try posix.getsockopt(sock, posix.SOL.FOO, posix.SO.BAR, std.mem.asBytes(&some_i32));
- For the common fixed-length cases, the assert on the returned length in getsockopt() is weakened to match the general case (<=). For example, in the point above, nothing would catch that getsockopt() claimed to only write 2/4 of the i32's bytes.
- (or, I guess, they could not throw away the result and assert on its length for themselves, but that looks pretty messy at the call site)
- setsockopt()'s change to an optional opt would lose some ability to catch cases where sending a null was not intended (e.g. because the caller forgot to unwrap an optional earlier on, and thus sent an optional when they didn't mean to).
Upsides:
- Correctly handles almost all the odd cases I've found so far with reasonable ergonomics
- Doesn't require new separate xRaw() interfaces.

Linux SO_PEERSEC is the only case (that I personally have found so far, anyways!) that doesn't quite work out optimally even with this variant, but at least it is usable for this case at all. The caller would have to possibly loop multiple times on error.BufferTooSmall while growing their buffer size until they got a success with a buffer that was >= the required len. With a fully-raw interface, they would call at most twice, as they'd know an exact size that will definitely-work after the first failure. If that turns out to be the only such case in practice, then anyone that cares enough about this one suboptimal pattern could always drop to std.os.linux.getsockopt() to handle it optimally.

This could also be blended with @rootbeer 's Fixed/Simple suffix as well in the get case, just to clean up some of the negatives above. Basically, on top of the changed get interface above, have a posix.getsockoptFixed() variant that returns void and asserts equal lengths. That could perhaps be the best of all worlds here? The only real downside I see on this third path is that it assumes we won't find (or OSes won't in the future add) enough other odd cases that we have to go back and revisit all this again later.

Sep 02 '25 11:09 blblack

I recommend adding some test cases if you can. Even simple cases that will get an error at runtime are fine, as otherwise the unreachable code won't even get compiled.

Yeah, I tend to think I want this as well. It is a little tricky, as any decent test will inevitably have consequences. But we could, perhaps, find very-trivial cases that are fairly portable? Maybe create a temporary tcp socket (just socket creation, no binding), get SO_REUSEADDR on it, then set it to the same value that was returned from get?

Sep 02 '25 12:09 blblack

Well, I was going to add a test, but then I realized that even the most-basic test of the existing getsockopt() on the master branch fails on Windows targets because it's already broken there. At least, it fails to compile for me when testing from Linux with -fwine.

New force-push first adds a basic test of set+get for socket options (which fails on Windows), then fixes up the Windows calls, then applies the various fixes I had here before, and then has a final commit tacked on which represents the alternative "best of all worlds" mentioned earlier. I could go either way on that alternative outcome, personally. I think it's a little nicer than the leaving in the Raw() interfaces, but it is a minor breaking change for posix.getsockopt() users, and it is possible we'll find counter-examples in the future that still won't quite work right with these somewhat-less-limited interfaces.

Sep 03 '25 19:09 blblack

Also, probably fine to un-draft this now?

Sep 03 '25 21:09 rootbeer

After another night to sleep on this: I think since the getsockoptFixed() case is going to be the most-common one by far, it's silly to break the existing callers and make them use some alternate function name they have to discover. I'm gonna clean up the patch chain a bit, and flip it so getsockoptVariable() is the one that returns a slice.

[also cleaned up what I could from earlier comments, and fixed failing test on macos]

Sep 04 '25 12:09 blblack

LGTM! Hopefully the Zig team likes it too. :)

Sep 04 '25 15:09 rootbeer

FYI - I think the CI failure here was some kind of general limit/timeout, but I can't restart just that one.

Sep 08 '25 20:09 blblack

Shameless bump for feedback. If the last two commits (optional setopt value + additional getsockoptVariable() interface) are too icky, we could dump those commits out of this PR and opt to just let users fall back to platform-specific stuff via raw std.c and/or std.os.linux interfaces for those odd edge cases.

Sep 23 '25 14:09 blblack

Hmm... maybe I'm missing something, but why don't we just make getsockopt return the value of len after the syscall? Seems like that'd solve the SO_PEERSEC case, and be no worse in terms of breakage than changing getsockopt to return a slice.

Sep 24 '25 01:09 alexrp

Hmm... maybe I'm missing something, but why don't we just make getsockopt return the value of len after the syscall? Seems like that'd solve the SO_PEERSEC case, and be no worse in terms of breakage than changing getsockopt to return a slice.

The current patchset, as it stands, doesn't make any breaking change to the existing posix.getsockopt() interface. It just adds a new interface getsockoptVariable() to handle some edge cases (and honestly, I think maybe your getsockoptSlice() suggestion actually is a better name for that interface! I'll switch it in the next push, but let's figure out the rest of this first).

I don't think any reasonable generic interface (other than a custom function just for this one option, or us building a higher-level interface in std.net somewhere that understands platform-specific and option-specific details and data types, which might be a nice future idea!) is really going to fully solve the SO_PEERSEC case optimally.

Recapping a bit, but: In all my searching of code/docs about socket options around the Internet on the various *nixes, Linux SO_PEERSEC is singularly "extra", with no other examples like it (that I could find, anyways!). Variable-length data. No documented upper limit on required buffer size. If you happen to pick a large-enough buffer size, you get a success, and the data you wanted written to the buffer, and the data's length returned via optlen. The data has no sentinel termination, so you have to know the optlen return value to parse the data. Additionally, if you pick a buffer size that's too small, it returns an ERANGE error while also returning an appropriate length via optlen, and doesn't touch the too-small buffer.

If we provided an interface that returned a length as a way to solve that case, the heart of it would have to work something like:

switch (errno(system.getsockopt(fd, level, optname, opt.ptr, &len))) {
    .SUCCESS => return len,
    .BADF => unreachable,
    .INVAL => return error.InvalidOption,
    // [.. other error cases like above ..],
    .RANGE => return len,
}

... and even then, the caller would have to do something like:

const len = try posix.getsockopt(fd, posix.SO.PEERSEC, mybufslice);
if (len > mybufslice.len) {
    // .. realloc mybufslice to len bytes long 
    const len2 = try posix.getsockopt(fd, posix.SO.PEERSEC, mybufslice);
    assert(len2 <= mybufslice.len);
    process_peer_sec(mybufslice[0..len2]);
} else {
    process_peer_sec(mybufslice[0..len]);
}

But since we can't be sure the universe of all present and future options on all platforms would only ever use ERANGE for this kind of purpose, we'd really want to special case that and only return len for that case if the option was SO_PEERSEC, so maybe it would be more like:

    .RANGE => return if (native_os == .linux and level == SO.PEERSEC) len else error.Something,

... at which point we've sort of set the stage for everyone to start customizing these functions per-platform/option, which will quickly grow unwieldy and probably belongs in a future higher-level interface, IMHO.

Also, if the general getsockopt() interface is returning a length and handling cases like these, we can't assert anything meaningful about the returned optlen (that it's == or <= the provided buffer len), unless we again special-case this and maybe some others.

If we ignore that one oddball annoying SO_PEERSEC case, though: the vast majority of socket options out there in the world, and especially the ones that most users are most likely to ever use, or ever see in common "how to do networking" examples, operate according to a fairly simple and restrictive set of rules:

There is a known, fixed data type and thus buffer size for the given option (commonly i32 interpreted either as an integer or a boolean, but also things like struct linger, struct in_addr, struct timespec, etc).
In successful getsockopt() cases, the caller doesn't really need to look at the returned optlen, unless it's just for an assertion. It should be unchanged from the input size, if the input buffer was precisely sized correctly. Ideally you'd assert that the returned optlen matched the input buffer size, like our current getsockopt() does.

And then there are some rare (in both senses: tiny amount of the space of options, and tiny likelihood of use across all programs) options that return optlen <= the buffer size (which has a known, fixed upper limit) and the caller would want to know the returned length: e.g. IP_OPTIONS on most platforms, and SO_BINDTODEVICE on Linux. The getsockoptVariable() (or ...Slice()) interface works fine for these cases, and it at least mostly-works for SO_PEERSEC. If someone cares to use SO_PEERSEC more-optimally, they could choose to just call straight into std.os.linux.getsockopt and write their own specific wrapper for this one option, too.

Sep 24 '25 12:09 blblack

... to be complete earlier, I should've given an example of "mostly-works" there at the bottom (for getsockoptSlice() + SO_PEERSEC), too, which would be something ~equivalent to:

while (true) {
    if (posix.getsockoptSlice(fd, posix.SO.PEERSEC, mybufslice)) |peer_sec| {
        process_peer_sec(peer_sec);
        break;
    } else |err| {
        if (err != error.BufferTooSmall)
            return err;
    }
    // ... grow mybufslice super-linearly or whatever
}

Sep 24 '25 15:09 blblack