zig icon indicating copy to clipboard operation
zig copied to clipboard

Self hosted: @memset causes failure in debug build (freestanding)

Open NerdySouth opened this issue 1 year ago • 3 comments

Zig Version

0.10.0

Steps to Reproduce and Observed Behavior

I have a GPIO blink program written for the RPI zero/A+ which I know successfully blinks the light. Due to this being a bare metal environment, and a building block for my kernel, I followed the age old wisdom to zero the bss section of my program in my zigMain. zigMain is the first function called after my asm entry point, which only sets up the sp and turns off interrupts. With the stage 1 compiler (0.9.1) all is well, and I call this code at the start of zigMain:

extern var __bss_start: u8; extern var __bss_end: u8; @memset(@as(*volatile [1]u8, &__bss_start), 0, @ptrToInt(&__bss_end) - @ptrToInt(&__bss_start));

This seemingly worked fine with the stage1 compiler, however, with the self hosted, it causes the program not to blink my LED. If I compile with optimizations (fast or small) then the program will work (likely optimizing out this memset since my binary never actually had anything in the bss section. Does Zig even use that section?)

Expected Behavior

Should call @memset to zero the bss section and NOT change program correctness.

NerdySouth avatar Nov 03 '22 01:11 NerdySouth

#13421 Has the project files for this.

NerdySouth avatar Nov 03 '22 01:11 NerdySouth

Make sure the resulting binary does have (valid) __bss_start and __bss_end otherwise you might be memsetting who knows what :)

llogick avatar Nov 03 '22 13:11 llogick

Hm. I see I included a version where I had commented out my use of the bss_start and bss__end symbols. In digging into your suggestion though I did notice what I think to be the culprit. In zig9, nothing was ever being put in my bss section, and so a memset call to it just so happened to be a memset on size 0, and was likely optimized out.

However, with the self hosted compiler, it seems that the <os.main> symbol is being put into the bss section, and now the bss has size > 1, and thus my memset is causing some weird behavior. I am including the project zip this time so if anyone wants to tinker they can, but I still find it weird that the memset would be wrong. If the bss__start/bss__end symbols aren't getting included in the output binary for my zig code to use, then that is also a problem is it not?

For reference, I am using llvm-objdump with llvm 15.0.2 for this.

project: pi-zig.zip

NerdySouth avatar Nov 04 '22 03:11 NerdySouth

I'm thinking that this might be due to broken memset and memclr referenced in #13530. I did a quick disassembly of your executable and found this:

0000c7a8 <__aeabi_memset>:
    c7a8:       e3510000        cmp     r1, #0
    c7ac:       012fff1e        bxeq    lr
    c7b0:       e92d4800        push    {fp, lr}
    c7b4:       e1a0b00d        mov     fp, sp
    c7b8:       ebfffffa        bl      c7a8 <__aeabi_memset>
    c7bc:       e8bd8800        pop     {fp, pc}

0000c7c0 <__aeabi_memclr>:
    c7c0:       e3510000        cmp     r1, #0
    c7c4:       012fff1e        bxeq    lr
    c7c8:       e92d4800        push    {fp, lr}
    c7cc:       e1a0b00d        mov     fp, sp
    c7d0:       ebfffffa        bl      c7c0 <__aeabi_memclr>
    c7d4:       e8bd8800        pop     {fp, pc}

DanB91 avatar Nov 12 '22 02:11 DanB91

I think you are right @DanB91. It appears to be the same issue as yours because if i define my own implementation in assembly it works fine.

NerdySouth avatar Nov 13 '22 02:11 NerdySouth

Expected Behavior

Should call @Memset to zero the bss section and NOT change program correctness.

I believe I've run into this before on riscv. https://github.com/im-tomu/fomu-workshop/blob/1018c590028b1fef578023360952a2b84d1809ec/riscv-zig-blink/src/fomu/start.zig#L29-L42

The answer I found at the time was that there is nothing to say that the implementation of memset doesn't use symbols in bss itself, and hence the clearing of bss during initialisation should always be done with inline assembly.

daurnimator avatar Nov 14 '22 04:11 daurnimator

Please provide a minimal reduction + instruction to reproduce including the assembly (extern function, which can be linked to reproduce the problem). If only an example with start code is viable, then please add proper bss zeroing and stack setup into the _start function.

matu3ba avatar May 04 '23 18:05 matu3ba

Hi, I'm not sure if this is the same bug or not but I found a very small repro for a @memset crash: #15634

Srekel avatar May 09 '23 19:05 Srekel

Can confirm it is still broken. The old std.mem.set worked fine but not that was dropped.

Not really any new info here but:

// Set by linker 
extern var __text_end: u32;
extern var __data_start: u32;
extern var __data_size: u32;
extern var __bss_start: u32;
extern var __bss_size: u32;

export fn Reset_Handler() void {
    const data_size = @ptrToInt(&__data_size);
    const data = @ptrCast([*]u8, &__data_start);
    const text_end = @ptrCast([*]u8, &__text_end);
    @memcpy(data[0..data_size], text_end[0..data_size]);

    const bss_size = @ptrToInt(&__bss_size);
    const bss = @ptrCast([*]u8, &__bss_start);
    // This works fine
    // for (0..bss_size) |i| {
    //     bss[i] = 0;
    // }
    @memset(bss[0..bss_size], 0);
    while (true) {}
}

Bss and size are as follows

(gdb) p bss
$6 = (u8 *) 0x20000010 <builtin.os> "\030"
(gdb) p bss_size
$7 = 164

At start of @memset It pushes bss into r0, and the length into r1, then jumps to __aeabi_memclr4

 |    0x5f2 <Reset_Handler+326>                       movw    r0, #16                                                                                                                                        
│   0x5f6 <Reset_Handler+330>                       movt    r0, #8192       ; 0x2000                                                                                                                       
│   0x5fa <Reset_Handler+334>                       movw    r1, #164        ; 0xa4                                                                                                                         
│   0x5fe <Reset_Handler+338>                       movt    r1, #0                                                                                                                                         
│ > 0x602 <Reset_Handler+342>                       bl      0x5d4c <__aeabi_memclr4>           

Registers are

r0             0x20000010          536870928
r1             0xa4                164
sp             0x2000ffe0          0x2000ffe0

memclr4 jumps to memclr

│  >0x5d4c <__aeabi_memclr4>        b.w     0x5d3c <__aeabi_memclr>                                                                                                                                       

Then sits in a recursive loop comparing the bss ptr to 0 until it hard faults.

|    > 0x5d3c <__aeabi_memclr>         cmp     r1, #0                                                                                                                                                         
│   0x5d3e <__aeabi_memclr+2>       it      eq                                                                                                                                                             
│   0x5d40 <__aeabi_memclr+4>       bxeq    lr                                                                                                                                                             
│   0x5d42 <__aeabi_memclr+6>       push    {r7, lr}                                                                                                                                                       
│   0x5d44 <__aeabi_memclr+8>       mov     r7, sp                                                                                                                                                        
│   0x5d46 <__aeabi_memclr+10>      bl      0x5d3c <__aeabi_memclr>  

Not sure why it replaces @memset with memclr, but maybe is optimizing memset out? Edit: Doesn't work with memset either.

frmdstryr avatar May 10 '23 20:05 frmdstryr

I think this issue can be closed, since the ARM compiler_rt function has been fixed in #15655.

matu3ba avatar Jun 27 '23 22:06 matu3ba