ch32v003fun
ch32v003fun copied to clipboard
more optimized startup code
Hi, you could consider using whole or parts of my more optimized startup code available here.
There are also other size optimizations like gp covering whole sram, .rodata before .text for 0 offset addressing or .srodata before .rodata (usually I see that the embedded toolchains are preconfigured to disable small data to conserve sram space)
while(1) is 202 bytes (188 without static initializers)
with 1KiB bootloader it should grow by 4 bytes, above it's more complicated
Hmm... there's a few trade-offs that I've worked through.
- It should be in a .c file, so that ch32v003fun can be included with only one .c file. Simplifies inclusion for people on other build systems.
- HPE should not be enabled by default. Users should choose if they want it or not. It is not a clear win for many situations.
- What about BSS?
- What about _data initialization?
- Other than that, this is great. Would you consider making some changes to make the default ch32v003fun implementation a little tighter?
I don't understand how .rodata being before .text is helpful, but that would be interesting.
I would like to integrate several of the principles.
Would you consider making a PR for ch32v003fun.c?
cruuuuuud... I guess I was turning HPE on in my code. I really should not.
First thing is to resolve the wanted naming for linker symbols as there is literally no standardization. There are also instances relying on .srodata (aka small rodata) being in SRAM.
I don't understand how .rodata being before .text is helpful, but that would be interesting.
Anything within bottom 2 KiB can be addressed by single addi (or lw) instruction instead of c.lui + addi.
The code uses relative offsets and function pointer making is less common than stuff in .rodata
- What about BSS?
- What about _data initialization?
do you mean .data and .bss sections? Those are done in L_51 and L_113 loops
cruuuuuud... I guess I was turning HPE on in my code. I really should not. BTW, only bit 0 of 0x804 is HPE, bit 1 enables interrupt nesting
by making assumption that mtvec is always initialized to starting address (bootloader sets up mtvec to app address without mode bits), there should be no size diff caused by app offset.
-2 bytes if relying on reset state of mtvec
wait, are you sure that this area at 0x1ffff000 can actually be used? datasheet only says it's "factory-cured bootloader" and those areas tend to be read only (and write once only).
by making assumption that mtvec is always initialized to starting address (bootloader sets up mtvec to app address without mode bits), there should be no size diff caused by app offset.
-2 bytes if relying on reset state of mtvec
That makes me anxxxxiousssss
wait, are you sure that this area at 0x1ffff000 can actually be used? datasheet only says it's "factory-cured bootloader" and those areas tend to be read only (and write once only).
Absolutely! That's what I have been using on my 1920-byte-USB thing. I sometimes use it, sometimes flash, but I can definitely reprogram the bootloader.
looking at the openwch repo, the reprogramming of the bootloader is a documented use case
The expected way of entering bootloader is by system reset after setting the bit 14 of FLASH->STATR
https://github.com/openwch/ch32v003/blob/main/EVT/EXAM/IAP/V00x_APP/User/main.c#LL32C5-L32C18
In this way the bootloader experiences the full reset and 0x00000000 remap.
Exiting to APP probably also requires this procedure (though some of the keying should be already done for flash programming). Because of system reset there seems to be no need for cleanup in peripheral registers.
That's quite different from stm32 "system" bootloaders
That makes me anxxxxiousssss
If the bootloader is in separate flash bank, that's entered/exit by system reset, I think that this -2 bytes is safe
Absolutely! That's what I have been using on my 1920-byte-USB thing. I sometimes use it, sometimes flash, but I can definitely reprogram the bootloader.
minichlink is offsetting the binaries to 0x1ffff000/0x08000000 from 0x00000000 based on command param.
I'll try to figure out how this works in the cursed openocd fork
BTW: https://github.com/cnlohr/ch32v003fun/blob/master/ch32v003fun/ch32v003fun.c#L817
This may break if the compiler decides to prepare address/immediate before the assembly blocks. GCC is biased to allocate from a5 down (accidentally no break yet), but llvm is going from a0 up.
Those inline asm blocks need register clobbers to be safe.
A small side track, does stack grow toward .code instead of global vars in the original linker script?
it's always from top towards .heap then .bss and .data