zig icon indicating copy to clipboard operation
zig copied to clipboard

Big executable file when big variables (PathSpace) on stack

Open shimamura-sakura opened this issue 1 year ago • 6 comments

Zig Version

0.12.0-dev.2619+5cf138e51

Steps to Reproduce and Observed Behavior

Build command: zig build-exe -OReleaseSmall -flto -fstrip -fsingle-threaded -target x86_64-windows main.zig

Code:

const std = @import("std");

pub fn main() void {
    _ = std.fs.cwd().openFile("test", .{}) catch {};
    _ = std.fs.cwd().createFile("test", .{}) catch {};
}

If I comment out any one line of the two, I get a 12KB exe file. However, If I keep both two lines, I get a 140KB exe file. Also, I see very long zero bytes in the EXE file.

big-small.zip

big.exe: both line. small.exe: only the "openFile" line.

Expected Behavior

The exe should be smaller, maybe 10-20KB.

shimamura-sakura avatar Feb 07 '24 10:02 shimamura-sakura

I would expect this to be because of PE format alignment requirements but someone else will have to confirm.

Vexu avatar Feb 07 '24 11:02 Vexu

The problem is that the compiler is trying to initialize two structs on the stack that are ~64k wide using weak_memcpy_default__alloca, which necessitates placing a similarly-lengthed buffer in the .rdata section.

/// > The maximum path of 32,767 characters is approximate, because the "\\?\"
/// > prefix may be expanded to a longer string by the system at run time, and
/// > this expansion applies to the total length.
/// from https://docs.microsoft.com/en-us/windows/desktop/FileIO/naming-a-file#maximum-path-length-limitation
pub const PATH_MAX_WIDE = 32767;

pub const PathSpace = struct {
    data: [PATH_MAX_WIDE:0]u16,
    len: usize,

    pub fn span(self: *const PathSpace) [:0]const u16 {
        return self.data[0..self.len :0];
    }
};

We see initialization points in sliceToPrefixedFileW and wToPrefixedFileW:

pub fn sliceToPrefixedFileW(dir: ?HANDLE, path: []const u8) !PathSpace {
    var temp_path: PathSpace = undefined;
    temp_path.len = try std.unicode.utf8ToUtf16Le(&temp_path.data, path);
    temp_path.data[temp_path.len] = 0;
    return wToPrefixedFileW(dir, temp_path.span());
}

In wToPrefixedFileW, the undefined PathSpace is initialized depending on a branch. Comparing source code with decompilation, for example:

var path_space: PathSpace = undefined;
// ...
const path_byte_len = ntdll.RtlGetFullPathName_U(
    path_to_get.ptr,
    buf_len * 2,
    path_space.data[path_buf_offset..].ptr,
    null,
);
if (path_byte_len == 0) {
    // TODO: This may not be the right error
    return error.BadPathName;
} else if (path_byte_len / 2 > buf_len) {
    return error.NameTooLong;
}

This becomes:

    v32 = v133;
    v33 = RtlGetFullPathName_U_0(FileName, 2 * v30, (PWSTR)&dest[2 * v29], 0i64);
    if ( !v33 )
    {
      weak_memcpy_default__alloca(v163, (unsigned __int8 *)&byte_42F0D0, 0x10008ui64);
      v131 = 0;
      v132 = 0;
      v35 = 8;
      goto exit;
    }
    v34 = v33 >> 1;
    if ( v34 > v30 )
    {
      weak_memcpy_default__alloca(v163, (unsigned __int8 *)&::src, 0x10008ui64);
      v131 = 0;
      v132 = 0;
      v35 = 6;
      goto exit;
    }

Both ::src and byte_42F0D0 point to separate buffers with a length of 0x10008, which is the size of PathSpace on x64, and are entirely composed of zeroes.

I'm not sure why the compiler wants to do this, but it's probably related to the fact that PathSpace is so large that the compiler needs to call __chkstk in the function prologue:

image

As an aside, I would really like if PathSpace wasn't 64 kilobytes so that I don't have to worry about extreme stack usage when trying to just open a file. This is especially bad if the function is inlined but rarely called.

drew-gpf avatar Feb 08 '24 22:02 drew-gpf

As an aside, I would really like if PathSpace wasn't 64 kilobytes so that I don't have to worry about extreme stack usage when trying to just open a file. This is especially bad if the function is inlined but rarely called.

#225 can eliminate this problem in release mode.

notcancername avatar Feb 14 '24 06:02 notcancername

As an aside, I would really like if PathSpace wasn't 64 kilobytes so that I don't have to worry about extreme stack usage when trying to just open a file. This is especially bad if the function is inlined but rarely called.

This literally happening to me, I'm writing a library, inside of which I'm calling std.fs.createFileAbsoluteZ and depending on the process that uses said library, stack overflow can happen because of PathSpace

playday3008 avatar Jun 13 '25 22:06 playday3008

I am working on a DLL plugin that is limited in stack space due to the main executable managing the stack. This results in a stack overflow exception on any and all path actions, like openDir or the absolute versions. That's a big problem, as this basically means I can't do any file or directory I/O at all.

I can workaround it in a very ugly way, which might help others here too:

fn openDirAbsolute(path: []const u8) !std.fs.Dir {
    return switch (builtin.os.tag) {
        .windows => {
            const allocator = std.heap.page_allocator;
            
            const dir_path_w = try std.unicode.wtf8ToWtf16LeAllocZ(allocator, path);
            defer allocator.free(dir_path_w);

            const nt_prefix = &[_:0]u16{ '\\', '?', '?', '\\', 0 };
            const nt_path = try std.mem.concatWithSentinel(allocator, u16, &.{ nt_prefix[0 .. nt_prefix.len - 1], dir_path_w }, 0);
            defer allocator.free(nt_path);

            return std.fs.openDirAbsoluteW(nt_path, .{});
        },
        else => std.fs.openDirAbsolute(path, .{}),
    };
}

This basically does the same thing as the std.fs implementation, but without guardrails and without stack allocating the PathSpace. See const allocator = .... Don't use this if you don't intend to remove this code as soon as possible :-) An alternative, maybe better, workaround could be to spin up a new thread and do file I/O there, as typically the DLL can manage the thread's stack size.

The real solution would be to fix the stack allocation that happens with the PathSpace object. I think it's a design error that we're preventing the need of a heap allocation by allocating the maximum path size on the stack. We should probably clearly communicate this limitation and provide an ..Alloc alternative or something. I understand that this might clash with other OS implementations that don't need this, so further discussion and investigation might be necessary.

basdp avatar Sep 01 '25 11:09 basdp

Do I correctly recall that in either the TEB or the PEB there was some sort of PATH_MAX_WIDE sized buffer for use cases like this?


Sort of. StaticUnicodeBuffer. From https://www.geoffchappell.com/studies/windows/km/ntoskrnl/inc/api/pebteb/teb/index.htm

Image

However it appears to only be 0x1472-0x1268=0x20a=522 bytes long; so it can't fit a whole nt path; only a dos/win32 max length path (where the maximum is 260 bytes; which after adding a null byte for termination and conversion to utf8 you get (260+1)*2=522

daurnimator avatar Nov 06 '25 05:11 daurnimator