ShiftPWM Working SPI ISR optimizations with ~30% performance improvement

Working SPI ISR optimizations with ~30% performance improvement

Open bigjosh opened this issue 9 years ago • 0 comments

No functional changes, only local optimizations in the ShiftPWM.h file.

Performance changes...

Saves 2 cycles per bit by replacing the existing loop enclosing a sequence of 8 calls to add_one_pin_to_byte() with the singe function send_spi_bytes().

Writing the whole send sequence in a single inline ASM allowed for explicit pre-decrement indexing. This optimization did not happen naturally inside the loop because of compiler limitations.

Old emitted ASM per bit:

        add_one_pin_to_byte(sendbyte, counter, --ledPtr);
     3e2:   20 81           ld  r18, Z
     3e4:   52 17           cp  r21, r18
     3e6:   47 95           ror r20
     3e8:   31 97           sbiw    r30, 0x01   ; 1

New emitted ASM per bit...

     3d6:   02 90           ld  r0, -Z
     3d8:   20 15           cp  r18, r0
     3da:   37 95           ror r19

There is also a savings of 1 cycle per byte because the compiler redundantly compares the loop variable to zero after decrementing it.

Non-performance changes...

Using an EOR against a preloaded register rather than a NEG invert the output bits instead of a compare and branch. This is performance neutral if the option is selected, and costs a single cycle per byte if it is not since the code would have been statically eliminated in the old version.
Preloading the load balancing step factor into a register (0 or 8 depending on ShiftPWM_balanceLoad) and then always adding this to the counter on each loop pass. This is performance neutral if the option is selected, and costs a single cycle per byte if it is not since the code would have been statically eliminated in the old version.

These changes were motivated by keeping the ASM code clean and continuous. In order to allow static elimination based on a const variable, I would have either had to...

Break up the asm() into multiple parts. In my experience, this increases the chances that the compiler will mess up the emitted code, especially on older versions of the Arduino IDE.
Make 4 versions of the send_spi_bytes() function to cover each case of the options being selected. This is optimal, but ugly.

Overall, I think the per-bit savings get the code fast enough that it is keeping up with the SPI hardware, so any additional savings would likely be wasted waiting for the SPI transmit to complete.

That said, if there was motivation to support 2X SPI mode, then there are some tricks we could use to keep up with that. Let me know if you think this is a relevant use case.

Ossilicope traces of the before and after outputs attached. shiftpwm_rgb_example before shiftpwm_rgb_example after

Thanks!

-josh

Dec 20 '15 02:12 bigjosh

ShiftPWM ShiftPWM copied to clipboard

Working SPI ISR optimizations with ~30% performance improvement

Performance changes...

Non-performance changes...

ShiftPWM
ShiftPWM copied to clipboard