zig icon indicating copy to clipboard operation
zig copied to clipboard

don't do feature detection with `@hasDecl`

Open andrewrk opened this issue 3 months ago • 13 comments

Violations:

../lib/std/os.zig:pub const have_sigpipe_support = @hasDecl(@This(), "SIG") and @hasDecl(SIG, "PIPE");
../lib/std/os.zig:    if (@hasDecl(system, "pipe2")) {
../lib/std/debug.zig:pub const have_ucontext = @hasDecl(os.system, "ucontext_t") and
../lib/std/debug.zig:pub const have_getcontext = @hasDecl(os.system, "getcontext") and
../lib/std/debug.zig:        } else if (@hasDecl(os.system, "msync") and native_os != .wasi and native_os != .emscripten) {
../lib/std/debug.zig:        if (@hasDecl(root, "os") and @hasDecl(root.os, "debug") and @hasDecl(root.os.debug, "openSelfDebugInfo")) {
../lib/std/debug.zig:    .freebsd, .openbsd => @hasDecl(os.system, "ucontext_t"),
../lib/std/debug.zig:    if (@hasDecl(windows, "CONTEXT")) {
../lib/std/os/test.zig:const dl_phdr_info = if (@hasDecl(os.system, "dl_phdr_info")) os.dl_phdr_info else anyopaque;
../lib/std/os/test.zig:    if (!@hasDecl(os.system, "rlimit")) {
../lib/std/os/uefi/tables/boot_services.zig:        if (!@hasDecl(protocol, "guid"))
../lib/std/os/linux.zig:    if (@hasDecl(VDSO, "CGT_SYM")) {
../lib/std/crypto/tlcsprng.zig:const os_has_arc4random = builtin.link_libc and @hasDecl(std.c, "arc4random_buf");
../lib/std/crypto/tlcsprng.zig:    if (builtin.link_libc and @hasDecl(std.c, "arc4random_buf")) {
../lib/std/heap.zig:    pub const malloc_size = if (@hasDecl(c, "malloc_size"))
../lib/std/heap.zig:    else if (@hasDecl(c, "malloc_usable_size"))
../lib/std/heap.zig:    else if (@hasDecl(c, "_msize"))
../lib/std/heap.zig:    pub const supports_posix_memalign = @hasDecl(c, "posix_memalign");
../lib/std/fs/File.zig:        if (!@hasDecl(@TypeOf(self.stat), "birthtime")) return null;
../lib/std/fs/Dir.zig:                const next_index = self.index + if (@hasDecl(posix.system.dirent, "reclen")) bsd_entry.reclen() else bsd_entry.reclen;
../lib/std/fs/Dir.zig:    if (@hasDecl(posix.system, "LOCK")) {
../lib/std/builtin.zig:    if (@hasDecl(builtin, "explicit_subsystem")) break :blk builtin.explicit_subsystem;

Marking as bug because there's probably a bug among these instances. This is a brittle way to do feature detection. As a general rule: don't do it.

andrewrk avatar Mar 18 '24 23:03 andrewrk

This is a brittle way to do feature detection. As a general rule: don't do it.

What alternative mechanism(s) is(/are) recommended? I guess directly accessing a structured/typed configuration value (such as std.Options introduced in https://github.com/ziglang/zig/pull/18712), one for each usage site?

rohlem avatar Mar 18 '24 23:03 rohlem

Let's look at the very first result:

pub const have_sigpipe_support = @hasDecl(@This(), "SIG") and @hasDecl(SIG, "PIPE");

The first condition is already bitrotted to always returning true without anyone noticing. The second condition will start returning the wrong value when type safety is added by converting the struct namespace to an enum.

Whether the operating system has a pipe signal is determined by the operating system. So that's what the value should be based on - the operating system.

Should emscripten be marked as supporting SIGPIPE? Probably not. It's currently set to true.

andrewrk avatar Mar 19 '24 01:03 andrewrk

You missed a funny match:

src/Builtin.zig:        \\/// feature detection (i.e. with `@hasDecl` or `@hasField`) over version checks.

jacobly0 avatar Mar 20 '24 00:03 jacobly0

Indeed. That text was introduced in a0ad2dee6a0b3bb7fd04032f6113206f7b4e73eb

andrewrk avatar Mar 20 '24 01:03 andrewrk

I'm struggling a bit with related issues trying to add a syscall to std.(c|posix|linux) that only exists on some POSIXy platforms, and making sense of all the recent improvements in these areas, and the desire to reduce/eliminate both the usingnamespace and @hasDecl patterns for setting up and/or detecting platform support for a given call.

My example case I'm working through is mlockall(), latest imperfect iteration shown here: https://github.com/ziglang/zig/commit/94ef2a69301d6a67431e48bc473a3cbb4db66fb6 . I can probably imagine (and perhaps find existing examples in std of!) 10+ other distinct ways to set these up, but they all have varying tradeoffs and it's kind of a minefield unless we've adopted a standard way of doing these things that we're happy with.

The main problem with the variant linked above is that it's relying on both @hasDecl and a typeof check against void in the preamble of its std.posix wrapper (which seem out of fashion, at the very least!). These checks are to ensure that it does an @compileError() when called on a non-supporting platform (a libc without the function, or a platform like plan9 which doesn't use libc and also lacks the syscall), and the same basic check in the test of the posix function, so that we don't try to test it on such platforms.

I could imagine copying the list of supporting libcs/platforms that was used in lib/std/c.zig over to lib/std/posix.zig as well, but then you've got the same list in 2-3 different files where they might drift out of sync and bitrot from each other.

I could also imagine just moving the libc declarations (slightly-redundantly, but seems kinda ok to me tbh) down to all the individual lib/std/c/foo.zig instead of the native_os switching and private.foo magic, and then relying on just @hasDecl without the void typeof check. Still uses @hasDecl, but to me seems at least a little cleaner.

You could also argue to just unconditionally put these C decls in lib/std/c.zig without caring about which platforms support it, and/or similarly for the lib/std/posix.zig interface on top of it. The problem with that approach is that when someone does call it on an unsupported platform, they're now going to end up with an undefined symbol at link time instead of an earlier and clearer error. On top of that, there's no good way to exercise it in lib/std/posix/test.zig at that point, because it will fail on random platforms without some platform-list guard at that layer.

I don't have any stellar ideas for the One True Pattern that seems to make the best design sense for all similar cases. So far, though, various approaches that end up involving @hasDecl seem cleaner than others I can come up with.

blblack avatar Mar 20 '24 20:03 blblack

I could imagine copying the list [...], but then you've got the same list in 2-3 different files [...] .

@blblack Just my 2c, but generally for deduplication of common logic I'd recommend writing a common function that is accessed at every usage site. (To allow for extra logic each usage site can also additionally have its own function which calls the one common logic/implementation.) This function should probably mostly be evaluated at comptime; marking it as inline fn might help with that.

In the particular case of mlockall you linked, I would personally avoid a conditionally-typed declaration, since it doesn't seem necessary:

  • Provide a const mlockall_supported: bool for checking whether the current system supports the function.
    • If querying for other systems should be supported, the logic can be provided as fn mlockallSupported(os: Os) bool, otherwise it can just be written inline as initializer.
  • Provide a single fn mlockall which checks mlockall_supported with @compileError on unsupported systems before delegating to the particular implementation.

With this, user code can reliably use mlockall_supported to query native support, or just call mlockall directly and get an appropriate compile error if it isn't supported (instead of a strange type mismatch).

rohlem avatar Mar 20 '24 21:03 rohlem

@rohlem - I get what you're driving at, and I'll try to spin that around in my head a bit and think on it. I'm not a big fan of the type mismatch either, it was just the latest of a few different avenues suggested/attempted (after we got rid of relying on usingnamespace to solve part of the problem, the way many such things did until recently). The hard part is maybe finding the right layers to put some bits of this in (e.g. std.c vs std.c.foo vs std.os.foo vs std.posix) while keeping things as minimal and elegant and future-proof as possible (and preferably avoiding having any missing syscall get "diagnosed" by an eventual link-time missing symbol error, which seems like the worst way to find out). Anyways, I'll ponder your suggestion overnight and make a new attempt at the mlockall patch and see how it plays out, thank you!

blblack avatar Mar 21 '24 04:03 blblack

Whether the operating system has a pipe signal is determined by the operating system. So that's what the value should be based on - the operating system.

Using switch (builtin.os.tag) for feature detection should be thought of the same way as browser sniffing. Consider the recent change:

const have_rlimit = switch (builtin.os.tag) {
    .windows, .wasi => false,
    else => true,
};
if (!have_rlimit) return;

Defaulting to true isn't correct for freestanding or other since nothing can be assumed about the capabilities of those targets. It likely isn't correct for many other esoteric targets as well. You could invert the logic so that false is the default, but then you've got to remember to include xyzbsd in your long laundry list of operating systems that have the feature.

Maybe @hasDecl isn't cutting it and there's a better pattern to adopt, but I don't think using OS tag in more places is going to be a satisfying alternative that scales well with either contributors or operating system support.

jayschwa avatar Mar 21 '24 19:03 jayschwa

The problem here is that there is no precise enough way to specify intent. The OS is not precise enough and whether a particular namespace has a declaration is also not precise enough. The code needs to be able to ask the correct question.

andrewrk avatar Mar 21 '24 19:03 andrewrk

I think for a lot of basic cases (does system X have a reasonable implementation in Zig of posix func Y), the OS conditionals may have to be the best thing we've got. I think the pattern I've now pushed in the mlockall PR seems reasonably elegant across the layers, in https://github.com/ziglang/zig/pull/19203/commits . This doesn't use @hasDecl, doesn't use the "private struct" hack or voidness thing, doesn't create libc declarations on platforms which don't actually have the underlying call, etc. It simply puts the libc decls in the per-OS libc interface files, and has the OS-conditional support logic up in the std.posix wrapper, where unsupported platforms simply return an error.NotImplemented (which I think of kinda like ENOSYS).

This is testable at the std.posix layer, and if the same pattern were applied to the rlimit case, you could just delete the have_rlimit check and let the error.NotImplemented bail you out from the existing line:

var lim = posix.getrlimit(.NOFILE) catch return; // Oh well; we tried.

blblack avatar Mar 21 '24 20:03 blblack

(Hmmm, except, that's not really true about the existing catch return being enough, because probably .NOFILE isn't defined on those other platforms. So maybe there's another layer of issues to go here.)

blblack avatar Mar 21 '24 20:03 blblack

One could imagine a solution at the std.posix layer where it defines all the enums/structs for the constant arguments there, with just the bog-standard "POSIX standards" names available and some broken default values, and then overrides them with the system-provided ones where those exist. This allows avoiding a straight-up compilation error in simple usage of mostly-portable constructs. For example, in lib/std/posix.zig, we could have things like (and sorry this uses @hasDecl for now, go figure!):

pub const MCL = if (@hasDecl(system, "MCL")) system.MCL else struct {
    pub const CURRENT = 0xAA,
    pub const FUTURE = 0xAB,
};

pub const rlimit_resource = if (@hasDecl(system, "rlimit_resource")) system.rlimit_resource else enum(i32) {
    CPU = 0xAA,
    FSIZE = 0xAB,
   // ... at a minimum, all the ones specified in https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_resource.h.html , although perhaps in some cases we might add an extremely common extension one here just as user convenience.
};

Then the caller, if they use standard/portable flag names, at least doesn't get a compilation error and instead gets the error.NotImplemented return instead. If they use some other extended/non-portable flag that would otherwise be offered by a subset of OSes, they're on their own at that point. It fits in the POSIX bailiwick, as POSIX defines these standard flags/enums/whatever, but usually leaves their values as implementation-defined for OSes to possible vary on in their own header implementations.

blblack avatar Mar 21 '24 21:03 blblack

(or we could be super-lazy about the above approach and just fall back to the linux definitions when "system" doesn't have one, even on other platforms. It's probably the most-common case anyways, it will cover the POSIXy basics (and then some!), and doesn't require the voluminous work of explicitly defining broken alternate versions of all of them, and you get the same desired "no compile failure" effect).

blblack avatar Mar 21 '24 21:03 blblack

[...] but I don't think using OS tag in more places is going to be a satisfying alternative that scales well with either contributors or operating system support.

I think, when we're talking about code inside the std.(c|os|posix) namespaces themselves, OS switching is probably the best we'll get. The best we can hope for is that these layers make it easy for upper layers and application code to stop caring so much about operating systems, at least in common cases (there will always be some cases where upper-layer code knows it needs to do OS-specific things, of course).

blblack avatar Mar 25 '24 12:03 blblack