zig icon indicating copy to clipboard operation
zig copied to clipboard

Improve code generation of zig.fmt

Open N00byEdge opened this issue 4 years ago • 6 comments
trafficstars

Today the code generated by zig.fmt is far from desirable.

Here is a sketch that I made up in a afternoon. Of course it's not a complete implementation, so it can't print everything yet, and most importantly it doesn't handle a runtime known writer properly yet, but it shows what the main issues are.

In my opinion, doing log("Hello, {d}!", .{asm("rdrand %[reg]" : [reg] "=r" (->u64))}); should be entirely inlined, and instead just call a couple of functions, and be equivalent to the following code (and this is what I achieved)

    call get_some_log_lock
    push rax
    lea rdi, "Hello, "
    call print_string
    rdrand rdi
    call print_runtime_decimal_value
    lea rdi, "!\n"
    call print_string
    pop rdi
    call release_log_lock

Today, you get the worst of both worlds. You get one function for each call site, which parses the string at runtime, causing a lot of code bloat. Let's get some numbers on this. Replacing the std.fmt usage in the Florence OS kernel resulted in the following change:

$ bloaty --debug-vmaddr=0xF zig-out/bin/Flork_stivale2_x86_64_newlog -- zig-out/bin/Flork_stivale2_x86_64
    FILE SIZE        VM SIZE
 --------------  --------------
  +0.1%     +14  [ = ]       0    .debug_abbrev
   +22%      +4  [ = ]       0    [Unmapped]
  [ = ]       0  [DEL]      -8    [LOAD #2 [RW]]
  [ = ]       0  -0.0%     -24    .bss
  -2.3% -1.64Ki  [ = ]       0    .debug_pubnames
 -10.0% -3.03Ki  [ = ]       0    .debug_frame
 -12.6% -5.08Ki  [ = ]       0    .debug_pubtypes
  -5.1% -6.32Ki  [ = ]       0    .debug_str
 -68.4% -7.01Ki -68.8% -7.01Ki    .data
 -10.8% -7.68Ki -10.8% -7.68Ki    .rodata
 -24.3% -11.6Ki  [ = ]       0    .strtab
 -43.1% -17.1Ki  [ = ]       0    .symtab
 -17.4% -35.3Ki  [ = ]       0    .debug_line
 -25.9% -38.7Ki  [ = ]       0    .debug_ranges
 -21.7% -72.8Ki -21.7% -72.8Ki    .text
 -24.3%  -133Ki  [ = ]       0    .debug_info
 -23.7%  -136Ki  [ = ]       0    .debug_loc
 -21.1%  -476Ki  -0.5% -87.5Ki    TOTAL

As we can see, over 20% of the entire kernel executable was just unnecessary code generated by std.fmt, and the same goes for the executable in large, half a meg just disappeared.

I think the main issues with std.fmt today is that not enough functions are callconv(.Inline) nor noinline. I achieved this by going the extra mile to say that everything that takes comptime fmt: []const u8 is callconv(.Inline), and anything that doesn't is noinline. Just doing this gets you very far. That way the format string code generation is as specialized to the format string as possible, and it instead tries to break it down into elementary components.

I also took extra caution to always append comptime known values to the format string itself and, by doing that, combine as many value writes as possible.

N00byEdge avatar Aug 27 '21 09:08 N00byEdge

For LLVM this should work fine. Probably this should be done after the planned zig.fmt overhaul in #1358 and sufficient testing.

Would this break DWARF info in self-hosting (without LLVM) for the inlined functions?

matu3ba avatar Aug 27 '21 12:08 matu3ba

When you use RdRand or RdSeed you should check the cary flag status which tells you whether or not you got a random number. So just inlining the RdRand instruction and not checking the flag is not a good plan. It'll work providing the silicon is healthy or not under physical attack, but in a security context, it's important to check the carry flag.

dj-on-github avatar Feb 04 '22 05:02 dj-on-github

@dj-on-github I think your post belongs into #10755 or #10758 as this issue does not deal with random numbers.

matu3ba avatar Feb 04 '22 08:02 matu3ba

@dj-on-github providing a securely random number was not my intention, just to show what I think the code generation should be for this part of the standard library.

N00byEdge avatar Feb 04 '22 12:02 N00byEdge

not sure when exactly it regressed but I looked into this recently and the initial suggestion of marking std.fmt.format as inline and seeing how that affects codegen and compilation time is currently blocked on https://github.com/ziglang/zig/issues/15668

nektro avatar Feb 23 '24 03:02 nektro

#15668 is fixed; anything in particular blocking this now other than just available dev time?

Khitiara avatar Apr 21 '24 06:04 Khitiara