zig
zig copied to clipboard
Self hosted: @memset causes failure in debug build (freestanding)
Zig Version
0.10.0
Steps to Reproduce and Observed Behavior
I have a GPIO blink program written for the RPI zero/A+ which I know successfully blinks the light. Due to this being a bare metal environment, and a building block for my kernel, I followed the age old wisdom to zero the bss section of my program in my zigMain. zigMain is the first function called after my asm entry point, which only sets up the sp and turns off interrupts. With the stage 1 compiler (0.9.1) all is well, and I call this code at the start of zigMain:
extern var __bss_start: u8;
extern var __bss_end: u8;
@memset(@as(*volatile [1]u8, &__bss_start), 0, @ptrToInt(&__bss_end) - @ptrToInt(&__bss_start));
This seemingly worked fine with the stage1 compiler, however, with the self hosted, it causes the program not to blink my LED. If I compile with optimizations (fast or small) then the program will work (likely optimizing out this memset since my binary never actually had anything in the bss section. Does Zig even use that section?)
Expected Behavior
Should call @memset to zero the bss section and NOT change program correctness.
#13421 Has the project files for this.
Make sure the resulting binary does have (valid) __bss_start and __bss_end otherwise you might be memsetting who knows what :)
Hm. I see I included a version where I had commented out my use of the bss_start and bss__end symbols. In digging into your suggestion though I did notice what I think to be the culprit. In zig9, nothing was ever being put in my bss section, and so a memset call to it just so happened to be a memset on size 0, and was likely optimized out.
However, with the self hosted compiler, it seems that the <os.main> symbol is being put into the bss section, and now the bss has size > 1, and thus my memset is causing some weird behavior. I am including the project zip this time so if anyone wants to tinker they can, but I still find it weird that the memset would be wrong. If the bss__start/bss__end symbols aren't getting included in the output binary for my zig code to use, then that is also a problem is it not?
For reference, I am using llvm-objdump with llvm 15.0.2 for this.
project: pi-zig.zip
I'm thinking that this might be due to broken memset and memclr referenced in #13530. I did a quick disassembly of your executable and found this:
0000c7a8 <__aeabi_memset>:
c7a8: e3510000 cmp r1, #0
c7ac: 012fff1e bxeq lr
c7b0: e92d4800 push {fp, lr}
c7b4: e1a0b00d mov fp, sp
c7b8: ebfffffa bl c7a8 <__aeabi_memset>
c7bc: e8bd8800 pop {fp, pc}
0000c7c0 <__aeabi_memclr>:
c7c0: e3510000 cmp r1, #0
c7c4: 012fff1e bxeq lr
c7c8: e92d4800 push {fp, lr}
c7cc: e1a0b00d mov fp, sp
c7d0: ebfffffa bl c7c0 <__aeabi_memclr>
c7d4: e8bd8800 pop {fp, pc}
I think you are right @DanB91. It appears to be the same issue as yours because if i define my own implementation in assembly it works fine.
Expected Behavior
Should call @Memset to zero the bss section and NOT change program correctness.
I believe I've run into this before on riscv. https://github.com/im-tomu/fomu-workshop/blob/1018c590028b1fef578023360952a2b84d1809ec/riscv-zig-blink/src/fomu/start.zig#L29-L42
The answer I found at the time was that there is nothing to say that the implementation of memset
doesn't use symbols in bss
itself, and hence the clearing of bss during initialisation should always be done with inline assembly.
Please provide a minimal reduction + instruction to reproduce including the assembly (extern
function, which can be linked to reproduce the problem).
If only an example with start code is viable, then please add proper bss zeroing and stack setup into the _start function.
Hi, I'm not sure if this is the same bug or not but I found a very small repro for a @memset
crash: #15634
Can confirm it is still broken. The old std.mem.set
worked fine but not that was dropped.
Not really any new info here but:
// Set by linker
extern var __text_end: u32;
extern var __data_start: u32;
extern var __data_size: u32;
extern var __bss_start: u32;
extern var __bss_size: u32;
export fn Reset_Handler() void {
const data_size = @ptrToInt(&__data_size);
const data = @ptrCast([*]u8, &__data_start);
const text_end = @ptrCast([*]u8, &__text_end);
@memcpy(data[0..data_size], text_end[0..data_size]);
const bss_size = @ptrToInt(&__bss_size);
const bss = @ptrCast([*]u8, &__bss_start);
// This works fine
// for (0..bss_size) |i| {
// bss[i] = 0;
// }
@memset(bss[0..bss_size], 0);
while (true) {}
}
Bss and size are as follows
(gdb) p bss
$6 = (u8 *) 0x20000010 <builtin.os> "\030"
(gdb) p bss_size
$7 = 164
At start of @memset
It pushes bss into r0, and the length into r1, then jumps to __aeabi_memclr4
| 0x5f2 <Reset_Handler+326> movw r0, #16
│ 0x5f6 <Reset_Handler+330> movt r0, #8192 ; 0x2000
│ 0x5fa <Reset_Handler+334> movw r1, #164 ; 0xa4
│ 0x5fe <Reset_Handler+338> movt r1, #0
│ > 0x602 <Reset_Handler+342> bl 0x5d4c <__aeabi_memclr4>
Registers are
r0 0x20000010 536870928
r1 0xa4 164
sp 0x2000ffe0 0x2000ffe0
memclr4 jumps to memclr
│ >0x5d4c <__aeabi_memclr4> b.w 0x5d3c <__aeabi_memclr>
Then sits in a recursive loop comparing the bss ptr to 0 until it hard faults.
| > 0x5d3c <__aeabi_memclr> cmp r1, #0
│ 0x5d3e <__aeabi_memclr+2> it eq
│ 0x5d40 <__aeabi_memclr+4> bxeq lr
│ 0x5d42 <__aeabi_memclr+6> push {r7, lr}
│ 0x5d44 <__aeabi_memclr+8> mov r7, sp
│ 0x5d46 <__aeabi_memclr+10> bl 0x5d3c <__aeabi_memclr>
Not sure why it replaces @memset
with memclr, but maybe is optimizing memset out? Edit: Doesn't work with memset either.
I think this issue can be closed, since the ARM compiler_rt function has been fixed in #15655.