footgun: std.os.environ will be undefined when using zig build-lib on linux and not linking libc
std.os.environ is initialized to undefined and then populated in start code before main:
https://github.com/ziglang/zig/blob/0cd89e9176ab36fc5e267120dc4d75cb79d32684/lib/std/os.zig#L72-L73
https://github.com/ziglang/zig/blob/0cd89e9176ab36fc5e267120dc4d75cb79d32684/lib/std/start.zig#L181
When linking libc, code can use the environ external variable and getenv libc call. However when both (1) not linking libc and (2) making a library rather than an application, zig code actually does not have access to environment variables on Linux, since the only time they are available is in start code, and in this case zig is not in control of the start code and is not guaranteed that it will be libc start code. On all other systems, they either require linking libc, or, on Windows, the ProcessEnvironmentBlock always provides access to the environment variables.
This is also a problem when linking libc, and making a library rather than an application. In this case, the solution is to go through libc for environment variables. But as it stands, there is a footgun, because std.os.environ will be undefined.
Ideally this situation can be resolved with a compile error happening rather than runtime detection of the problem. At the very least it should not be an undefined footgun.
Here's an example of a wounded foot: #4521
Also related: std.os.argv and the auxval.
Related: #3511
Can't you use /proc/
Can't you use /proc//environ?
That's a nice idea, I hadn't thought of it. One thing to consider however (pointed out by @MaskRay) is that it would not reflect putenv/setenv/clearenv/unsetenv changes by libc.
Edit: This is OK though, because in this case we are not linking libc, and so putenv/setenv/clearenv/unsetenv are irrelevant.
Can't you use /proc//environ?
That's a nice idea, I hadn't thought of it. One thing to consider however (pointed out by @MaskRay) is that it would not reflect putenv/setenv/clearenv/unsetenv changes by libc.
Hm. I think if both application and library statically link CRT result would be the same.
There is no problem if the library links libc. In this case the only thing to fix is to move std.os.environ somewhere less likely to get called by accident. The situation is when the library is not told that libc will be linked.
There is no problem if the library links libc. In this case the only thing to fix is to move
std.os.environsomewhere less likely to get called by accident. The situation is when the library is not told that libc will be linked.
No, my point is, that in C/C++ (forget Zig for a second), at least on Windows/MSVC, one can have two CRTs running side by side within one process, one for application and one for a dynamic library, totally separate and not knowing of each other. In that case setenv in application will not affect environ value for a library. It is actually a supported (but not recommended) scenario. Even CRT versions may be different.
We can test to be sure, but I'm pretty sure in that scenario, you would get a link error because environ would be declared twice. Or maybe it is declared with "weak" linkage, in which case, one of them wins, and both CRTs will work in harmony, using the same environ symbol.
one can have two CRTs running side by side within one process
Even CRT versions may be different.
Very interesting, also disturbing
@andrewrk "dynamic library". I mean DLL.
Also related: std.os.argv and the auxval.
Stumbled today upon the missing auxval when a panic is raised in a shared library:
/// use.c - compiled with 'clang -o use -ldl -g use.c'
#include <dlfcn.h>
typedef char (*addfn)(char, char);
int main() {
void *h = dlopen("libadd.so", RTLD_LAZY);
addfn zigadd = (addfn) dlsym(h, "add");
(*zigadd)(100, 100);
return 0;
}
/// add.zig - compiled with 'zig build-lib -dynamic add.zig'
export fn add(a: i8, b: i8) i8 {
return a + b;
}
Trying to run it:
$ LD_LIBRARY_PATH="$(pwd)" ./use
thread 37396 panic: integer overflow
Panicked during a panic. Aborting.
Aborted (core dumped)
gdb stacktrace:
#0 0x00007ffff7d917b2 in std.os.linux.x86_64.syscall4 (number=rt_sigprocmask, arg1=2, arg2=140737488343936, arg3=0, arg4=8) at /usr/lib/std/os/linux/x86_64.zig:47
#1 0x00007ffff7d78860 in std.os.linux.sigprocmask (flags=2, set=0x7fffffffd380, oldset=0x0) at /usr/lib/std/os/linux.zig:890
#2 0x00007ffff7d773f7 in std.os.raise (sig=6 '\006') at /usr/lib/std/os.zig:255
#3 0x00007ffff7d764d0 in std.os.abort () at /usr/lib/std/os.zig:218
#4 0x00007ffff7d762fb in std.debug.panicExtra (trace=0x0, first_trace_addr=..., args=...) at /usr/lib/std/debug.zig:302
#5 0x00007ffff7d758bb in std.builtin.default_panic (msg=..., error_return_trace=0x0) at /usr/lib/std/builtin.zig:688
#6 0x00007ffff7d7bf52 in std.process.getBaseAddress () at /usr/lib/std/process.zig:713
#7 0x00007ffff7d7a1de in std.os.dl_iterate_phdr (context=0x7fffffffdba0) at /usr/lib/std/os.zig:4470
#8 0x00007ffff7d79d67 in std.debug.DebugInfo.lookupModuleDl (self=0x7ffff7dac608 <self_debug_info>, address=140737351476021) at /usr/lib/std/debug.zig:1290
#9 0x00007ffff7d7923c in std.debug.DebugInfo.getModuleForAddress (self=0x7ffff7dac608 <self_debug_info>, address=140737351476021) at /usr/lib/std/debug.zig:1138
#10 0x00007ffff7d78faf in std.debug.printSourceAtAddress (debug_info=0x7ffff7dac608 <self_debug_info>, out_stream=..., address=140737351476021, tty_config=escape_codes) at /usr/lib/std/debug.zig:593
#11 0x00007ffff7d77cde in std.debug.writeCurrentStackTrace (out_stream=..., debug_info=0x7ffff7dac608 <self_debug_info>, tty_config=escape_codes, start_addr=...) at /usr/lib/std/debug.zig:431
#12 0x00007ffff7d7691b in std.debug.dumpCurrentStackTrace (start_addr=...) at /usr/lib/std/debug.zig:114
#13 0x00007ffff7d7624f in std.debug.panicExtra (trace=0x0, first_trace_addr=..., args=...) at /usr/lib/std/debug.zig:275
#14 0x00007ffff7d758bb in std.builtin.default_panic (msg=..., error_return_trace=0x0) at /usr/lib/std/builtin.zig:688
#15 0x00007ffff7d76336 in add (a=100 'd', b=100 'd') at /tmp/add.zig:2
#16 0x0000555555555194 in main () at use.c:8
tested with zig version 0.8.0-dev.2065+bc06e1982
since you're not using the startup code there's no-one filling in the aux vector in elf_aux_maybe the error is an integer underflow in Process.getBaseAddress a quick workaround is adding -lc and linking against libc
@LemonBoy comments to this issue
If the library is compiled in safe release mode (zig build-lib -dynamic -O ReleaseSafe add.zig) the second panic isn't triggered (maybe some unwanted optimization?).
$ LD_LIBRARY_PATH="$(pwd)" ./use
thread 37859 panic: integer overflow
Aborted (core dumped)
(note the missing Panicked during a panic. Aborting. line)
gdb trace:
(gdb) b std.process.getBaseAddress
Function "std.process.getBaseAddress" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (std.process.getBaseAddress) pending.
(gdb) r
Starting program: /tmp/use
thread 37757 panic: integer overflow
Program received signal SIGABRT, Aborted.
std.os.raise (sig=<optimized out>) at /usr/lib/std/os.zig:257
257 switch (errno(rc)) {
(gdb) bt
#0 std.os.raise (sig=<optimized out>) at /usr/lib/std/os.zig:257
#1 0x00007ffff7fc36d6 in std.os.abort () at /usr/lib/std/os.zig:218
#2 0x00007ffff7fc3505 in std.debug.panicExtra (trace=<optimized out>, first_trace_addr=..., args=...) at /usr/lib/std/debug.zig:302
#3 0x00007ffff7fc2e6f in std.builtin.default_panic (msg=..., error_return_trace=0x2) at /usr/lib/std/builtin.zig:688
#4 0x00007ffff7fc1ca7 in add (a=<optimized out>, b=<optimized out>) at /tmp/add.zig:2
#5 0x0000555555555194 in main () at use.c:8
Summarized, so far the idea is to ideally
- "add an always inlined at caller function with weak linkage" that uses
/proc/environon Posix and the Windows pendant - remove it in the linker once we figure out its not needed
Problems with approach
- For performance we want to inline all that stuff and ideally not do post-processing in the linker etc. Dedicating a function runs counter to that.
- Some users also may not need environment variables at all or they are not allowed (security, safety standards etc).
- No support for custom start code to use that feature for debugging (although this is a very weak argument).
Personal opinion on better approach:
- comptime option in build.zig that overwrites start code for testing + functions to do the linker tests/queries etc
- this boils down to "emit a dedicated section/function" and "check if required sections for writing variables are existing".
I dont think its viable to fix this at object code level etc without being able to control what object code etc is generated (without perf loss).
Blocker for this are 1. functions to check, if sections in object code exist. ~~and 2. annotating the necessary knowledge in the build system to nicely interact with C.~~
So getting environment variables is quite easy via libc. (I also spent a couple of hours on this today.) Perhaps a naive question: can argv be retrieved from a shared library, assuming we link libc? ELF.
Edit: reading /proc/self/cmdline is an option. That's a bit messy. Also, IIRC, older kernels don't have /proc/self, which makes it even more messy. Other options?
@motiejus I have yet to see a kernel in use with /proc/self not being available. /proc/[pid]/environ should be accessible so could always do /proc/self/ first then fall back to /proc/[pid]. If speed is concerned with always accessing that file, could always cache the variables. Maybe there could also be a way to override that functionality or have a different function for "direct access".
Also about argv, /proc/self/cmdline contains that.
While it's true that it's relatively uncommon, there are modern, real world cases in which /proc/self and /proc/<pid> (or in fact any of procfs at all) will not be available. This includes some rare containerized contexts, and some less rare confidential computing / secure enclave contexts (which I anticipate will become more common in the coming years as TEE support becomes more widespread). It is not safe to assume that anything under /proc is guaranteed to be present.
@goonzoid Then the function should return null like if the environmental variable isn't set. The only ways I am aware of to get environmental variables is through the entry point and through procfs. If the three methods (procfs, libc, and entrypoint) all fail then the method should return null and not error out.
@RossComputerGuy Yes, I think I agree. If a function literally can't access the environment variables, then obviously there's not much it can do! I'm not saying "never use procfs", I just wanted to point out that it can't be depended on 100%.
@goonzoid Yeah my thought was that there would be multiple solutions just in case one fails.
The following works for us without using /proc/self/cmdline. The idea is to leverage ELF's .init_array section:
const _fix_argv linksection(".init_array") = &struct{
pub fn call(argc: c_int, argv: [*c][*:0]u8, envp: [*:null]?[*:0]u8) callconv(.C) void {
std.os.argv = argv[0..@intCast(argc)];
std.os.environ = @ptrCast(envp[0..std.mem.len(envp)]);
}
}.call;
comptime {
const builtin = @import("builtin");
if (builtin.object_format == .elf and builtin.output_mode != .Exe) {
_ = _fix_argv;
}
}
Now with macho as well:
const init_array_section = switch (builtin.object_format) {
.macho => "__DATA,__init_array",
.elf => ".init_array",
else => "",
};
const fix_argv linksection(init_array_section) = &struct{
pub fn call(argc: c_int, argv: [*c][*:0]u8, envp: [*:null]?[*:0]u8) callconv(.C) void {
std.os.argv = argv[0..@intCast(argc)];
std.os.environ = @ptrCast(envp[0..std.mem.len(envp)]);
}
}.call;
comptime {
if (builtin.output_mode != .Exe) {
switch (builtin.object_format) {
.elf, .macho => _ = fix_argv,
else => {},
}
}
}
Updated original post with accepted solution.