zig
zig copied to clipboard
Improve code generation of zig.fmt
Today the code generated by zig.fmt is far from desirable.
Here is a sketch that I made up in a afternoon. Of course it's not a complete implementation, so it can't print everything yet, and most importantly it doesn't handle a runtime known writer properly yet, but it shows what the main issues are.
In my opinion, doing log("Hello, {d}!", .{asm("rdrand %[reg]" : [reg] "=r" (->u64))}); should be entirely inlined, and instead just call a couple of functions, and be equivalent to the following code (and this is what I achieved)
call get_some_log_lock
push rax
lea rdi, "Hello, "
call print_string
rdrand rdi
call print_runtime_decimal_value
lea rdi, "!\n"
call print_string
pop rdi
call release_log_lock
Today, you get the worst of both worlds. You get one function for each call site, which parses the string at runtime, causing a lot of code bloat. Let's get some numbers on this. Replacing the std.fmt usage in the Florence OS kernel resulted in the following change:
$ bloaty --debug-vmaddr=0xF zig-out/bin/Flork_stivale2_x86_64_newlog -- zig-out/bin/Flork_stivale2_x86_64
FILE SIZE VM SIZE
-------------- --------------
+0.1% +14 [ = ] 0 .debug_abbrev
+22% +4 [ = ] 0 [Unmapped]
[ = ] 0 [DEL] -8 [LOAD #2 [RW]]
[ = ] 0 -0.0% -24 .bss
-2.3% -1.64Ki [ = ] 0 .debug_pubnames
-10.0% -3.03Ki [ = ] 0 .debug_frame
-12.6% -5.08Ki [ = ] 0 .debug_pubtypes
-5.1% -6.32Ki [ = ] 0 .debug_str
-68.4% -7.01Ki -68.8% -7.01Ki .data
-10.8% -7.68Ki -10.8% -7.68Ki .rodata
-24.3% -11.6Ki [ = ] 0 .strtab
-43.1% -17.1Ki [ = ] 0 .symtab
-17.4% -35.3Ki [ = ] 0 .debug_line
-25.9% -38.7Ki [ = ] 0 .debug_ranges
-21.7% -72.8Ki -21.7% -72.8Ki .text
-24.3% -133Ki [ = ] 0 .debug_info
-23.7% -136Ki [ = ] 0 .debug_loc
-21.1% -476Ki -0.5% -87.5Ki TOTAL
As we can see, over 20% of the entire kernel executable was just unnecessary code generated by std.fmt, and the same goes for the executable in large, half a meg just disappeared.
I think the main issues with std.fmt today is that not enough functions are callconv(.Inline) nor noinline. I achieved this by going the extra mile to say that everything that takes comptime fmt: []const u8 is callconv(.Inline), and anything that doesn't is noinline. Just doing this gets you very far. That way the format string code generation is as specialized to the format string as possible, and it instead tries to break it down into elementary components.
I also took extra caution to always append comptime known values to the format string itself and, by doing that, combine as many value writes as possible.
For LLVM this should work fine. Probably this should be done after the planned zig.fmt overhaul in #1358 and sufficient testing.
Would this break DWARF info in self-hosting (without LLVM) for the inlined functions?
When you use RdRand or RdSeed you should check the cary flag status which tells you whether or not you got a random number. So just inlining the RdRand instruction and not checking the flag is not a good plan. It'll work providing the silicon is healthy or not under physical attack, but in a security context, it's important to check the carry flag.
@dj-on-github I think your post belongs into #10755 or #10758 as this issue does not deal with random numbers.
@dj-on-github providing a securely random number was not my intention, just to show what I think the code generation should be for this part of the standard library.
not sure when exactly it regressed but I looked into this recently and the initial suggestion of marking std.fmt.format as inline and seeing how that affects codegen and compilation time is currently blocked on https://github.com/ziglang/zig/issues/15668
#15668 is fixed; anything in particular blocking this now other than just available dev time?