klipper
klipper copied to clipboard
2.9x speedup for stm32h7 by enabling cpu cache
As @KevinOConnor mentioned, someone ran the step rate benchmark on a stm32h743 mcu and only achieved a step rate of 570 ticks with the 3 stepper benchmark. This is much slower than the 204 ticks taken by the stm32f4 mcu.
I could roughly reproduce this result. And indeed it seems to be due to the cpu cache not being enabled. With this change the step time went down to 194 ticks!
The pr also changes the stm32h750 to use a larger ram area (like the stm32h743 did already).
Thanks! Any chance you could add a section to docs/Benchmarks.md with the benchmark results so that we track this going forward?
@adelyser - fyi.
-Kevin
P.S. The below is what I was going to add to Benchmarks.md given the feedback on the Klipper Discourse:
+### STM32H7 step rate benchmark
+
+The following configuration sequence is used on a STM32H743VIT6:
+```
+allocate_oids count=3
+config_stepper oid=0 step_pin=PD4 dir_pin=PD3 invert_step=-1 step_pulse_ticks=0
+config_stepper oid=1 step_pin=PA15 dir_pin=PA8 invert_step=-1 step_pulse_ticks=0
+config_stepper oid=2 step_pin=PE2 dir_pin=PE3 invert_step=-1 step_pulse_ticks=0
+finalize_config crc=0
+```
+
+The test was last run on commit `638303b3` with gcc version
+`arm-none-eabi-gcc (15:8-2019-q3-1+b1) 8.3.1 20190703 (release)
+[gcc-8-branch revision 273027]`.
+
+| stm32h7 | ticks |
+| -------------------- | ----- |
+| 1 stepper | 118 |
+| 3 stepper | 570 |
+
I pulled your PR and compiled it, and it didn't make a difference on my BTT-SKR-SE-BX board, I still am around 537 ticks.
i tested it with the skr3 board:
no cache: lowest ticks: 1 stepper: 118 3 stepper: 570
with cache enabled and without -O3 and the armv7e optimizations ( std. Makefile in github repo ): lowest ticks: 1 stepper: 44 3 stepper: 198
with cache enabled and with -O3 and the armv7e optimizations ( -mtune=cortex-m7 -march=armv7e-m+fp.dp -mcpu=cortex-m7 ): lowest ticks: 1 stepper: 42 3 stepper: 192
Very interesting. So i guess its called O3 because you get another 3% :)
Testing this more I am getting some rare serial connection failures with a "Got b'\xf8' from stk500v2" error at startup which may be due to the cache being enabled. I will be testing some more this week (I am using the native UART pins). Has this been stable for you?
no issues so far. I use the usbserial for the com interface tmc5160 for x,y and e ( spi ) and tmc2209 for z1 and z2 ( uart ) did a 5h print yesterday and actually another 6h print is printing. looking good so far.
did multiple bootups/startups, did 2x PETG CF 5h prints ( pic below ), no issue, no disconnect, no missed steps ( what you can see ), no layershifts ( also missed steps ) so far, looks perfect:
Printer Settings: 256 usteps for x,y,e,z1,z2 tmc chm=1 for x,y,e ( no SpreadCycle --> constant off time with fast decay time ) travel acceleration 10000 travel speed 600 print acceleration 1400 print speed 80
klipper version: v0.10.0-608-g638303b3-dirty with cache enabled and armv7e optimizations
FYI a "Got b'\xf8' from stk500v2"
type of message is not an error - it is just informational. The host code is just reporting about some bytes that were discarded prior to initiating the communication channel with the mcu.
-Kevin
So far, I haven't seen any improvement on the BTT-SKR-SE-BX board, and I'm not sure why. I plan to test it on the SKR3 EZ tonight. They are similar processors, so it's tough to understand why the caching helps one but not the other.
the startup issue seems to be fixed by enabeling the cache not at the end of the clock_setup but earlier. I also added the benchmark results.
Before it would sometimes get stuck at "Sending MCU 'mcu' printer configuration..." on my (somewhat nonstandard) setup.
@adelyser That is weird. Does this board have a custom bootloader? Are you sure the flashing was successful? I am using a stm32h750vb chip.
the startup issue seems to be fixed by enabeling the cache not at the end of the clock_setup but earlier. I also added the benchmark results.
Before it would sometimes get stuck at "Sending MCU 'mcu' printer configuration..." on my (somewhat nonstandard) setup.
@adelyser That is weird. Does this board have a custom bootloader? Are you sure the flashing was successful? I am using a stm32h750vb chip.
Yes, it definitely flashed, and I didn't see any change in the ticks, I was still at about 537. The BTT-SKR-SE-BX uses the STM32H743IIT6 processor. I'll pull your most recent commits when I get home and try it again though.
If you change src/stm32/Kconfig so that the stm32h743 uses a RAM_START of 0x20000000, does that change anything?
Separately, there is no reason for the h750 and h743 to be using different ram addresses. Klipper doesn't use lots of ram (max under 20KiB) so there is no advantage to introducing artificial differences to these chips.
-Kevin
True. The different ram adresses were introduced with the h743 port. I changed it so they both the same now
Ok, I sorted out what I was doing wrong, I got 197 on the BTT-SKR-SE-BX. I had the invert_step value incorrect...... Overall, looks good so far, I still have to test the SKR3 , but I suspect the tests already ran are accurate.
Thanks. I rebased and committed this change. I'm not sure about the change to the memory layout to h750, so I didn't merge that part of this PR.
-Kevin