qmk_firmware
qmk_firmware copied to clipboard
Fix buffer size for WS2812 PWM driver
In a recent build, I was trying to use more than 60 ws2812 using the PWM driver (per key RGB). My STM32F072 run out of RAM. (heap linker error). Apparently the buffer for the PWM overflow value is a 32bit register... Did some math to figure out if it need to be this big: High period for a 1 PWM frequency (CPUCLK/2) / (1000000000 / 800). Essentially we want to find how many PWM ticks it will take to get 800ns
The above equation never seems to go above 50 ticks... 48MHz CLK is about 19 ticks.
As the DMA controller allows a transfer of a byte of memory into a word of peripheral we better fo that as well:)
Now our buffer is about 1.5Kb of RAM instead of 6.5 Kb.
Description
Changed DMA for PWM WS2812 driver from word to word TO byte to word. Changed PWM buffer to a byte instead of a word as it was mostly empty:)
Types of Changes
- [x] Core
- [x] Bugfix
- [ ] New feature
- [x] Enhancement/optimization
- [ ] Keyboard (addition or update)
- [ ] Keymap/layout/userspace (addition or update)
- [ ] Documentation
Issues Fixed or Closed by This PR
- STM32F072 runs out of RAM with more than 60ish WS2812 LEDs using the PWM driver.
Checklist
- [x] My code follows the code style of this project: C, Python
- [x] I have read the PR Checklist document and have made the appropriate changes.
- [ ] My change requires a change to the documentation.
- [ ] I have updated the documentation accordingly.
- [x] I have read the CONTRIBUTING document.
- [ ] I have added tests to cover my changes.
- [x] I have tested the changes and verified that they work and don't break anything (as well as I can manage).
As the DMA controller allows a transfer of a byte of memory into a word of peripheral we better fo that as well:)
The problem is that not all STM32 chips have DMA controllers that support that kind of operation. Older chips (DMAv1 in ChibiOS) can zero-extend data exactly like you want if you specify different MSIZE and PSIZE values. However, some newer chips like F401/F411 (DMAv2 in ChibiOS) cannot do that — depending on DMA FIFO settings, they either force MSIZE and PSIZE to be the same, or perform multiple transfers (either collecting the larger destination data unit by performing multiple reads from the source, or splitting the larger source data unit into multiple writes to the destination). Therefore blindly changing MSIZE to BYTE would break the driver at least on F4x1 chips (and probably the whole F4xx series, or possibly even more).
So the suggested optimization would need to be conditional — the type of DMA controller implementation in the chip needs to be checked somehow. At least the most resource-restricted chips tend to have DMAv1, so the optimization would be possible for them.
For DMAv2 controllers the only possible optimization would be using 16-bit data if the chosen timer is 16-bit (in this case the DMA controller would perform 16-bit writes to the timer registers, which is not formally correct according to the datasheet, but seems to work in practice). Doing the same thing for a 32-bit timer, however, would break things (apparently a 16-bit write to a 32-bit register results in the 16-bit value getting duplicated to both halves of the 32-bit register).
Good shout @sigprof. I can condition this no problem. How about doing what was done for F1XX SPI like here: https://github.com/qmk/qmk_firmware/blob/88fe5c16a5cdca5e3cf13ef3cd91f5f1e4898c37/platforms/chibios/chibios_config.h Define DMAv2 in case of F401/F411. Then if DMAv2 is defined, in the WS2812 driver use 32bit transfers.
Any other suggestions are welcome:)
I did a quick skim of the F401 datasheet. As we are using double buffer mode it seems to be OK. Please look at 9.3.10 of https://www.st.com/resource/en/reference_manual/rm0368-stm32f401xbc-and-stm32f401xde-advanced-armbased-32bit-mcus-stmicroelectronics.pdf
I did a quick skim of the F401 datasheet. As we are using double buffer mode it seems to be OK. Please look at 9.3.10 of https://www.st.com/resource/en/reference_manual/rm0368-stm32f401xbc-and-stm32f401xde-advanced-armbased-32bit-mcus-stmicroelectronics.pdf
No, see this table — with memory width 8 and peripheral width 32 the controller will do 4 1-byte memory transfers, assemble a 32-bit word from those 4 bytes and perform a single 32-bit peripheral transfer
For comparison, https://www.st.com/resource/en/reference_manual/rm0091-stm32f0x1stm32f0x2stm32f0x8-advanced-armbased-32bit-mcus-stmicroelectronics.pdf has this:
Here the DMA controller zero-extends the data in case of width mismatch, which would be the desired behavior for the WS2812 driver. Unfortunately, the F401 DMA controller does not have an equivalent mode.
Cheers thanks for the clarification:) I also remembered that this driver supports all timers.. but not all timers are 32 bit:) So plan of action
- Use Chibios define
STM32_TIMx_IS_32BITS
to figure out what the peripheral is like. For MCUs with zero-extend do make the necessary transfer (byte to half word/word) - Use the defines
STM32F2XX STM32F4XX STM32F7XX
to check if the DMA controller cannot do the zero extend ~~and do so in software, either to a half word or a word.~~ In that case use either a half word or a word buffer depending on the timer used above. I realize we cant intercept the DMA transfer and do the conversion. Also F2/F4/F7 come with at least 64Kb of RAM so it should not be an issue. (quick math for a TKL. Around 10Kb when using words, around 5Kb when using halfword and about 2.5Kb when using bytes. Well depending how everything is packed that is:)
How does that sound?
@sigprof let me know what you think of it now:)
Hello! Any other changes you would like?
Sorry, has a merge conflict