RIOT icon indicating copy to clipboard operation
RIOT copied to clipboard

cpu/nrf5x_common: fix uart_poweroff()

Open maribu opened this issue 2 years ago • 25 comments

Contribution description

Previously, uart_poweroff() and uart_poweron() were no-ops. This replaces them with the logic to indeed power on and power off the UART device.

Testing procedure

  • ideally, power consumption should be less after a call to uart_poweroff()
    • beware: this is untested. however, at least a call to uart_write() while powered off would get stuck if not for the special handling, while before the UART remained fully operational when "powered off"
  • regular operation should resume after again calling uart_poweron()
    • this is tested

Issues/PRs references

None

maribu avatar Sep 15 '23 12:09 maribu

Murdock results

:heavy_check_mark: PASSED

de99c64d0b66c676cd15fa787919d93ea01b2bab cpu/nrf5x_common: fix uart_poweroff()

Success Failures Total Runtime
10083 0 10083 14m:30s

Artifacts

riot-ci avatar Sep 15 '23 15:09 riot-ci

bors merge

benpicco avatar Sep 17 '23 18:09 benpicco

Build failed (retrying...):

bors[bot] avatar Sep 17 '23 18:09 bors[bot]

bors cancel

The build failure seemingly was caused by this PR :-/

maribu avatar Sep 17 '23 18:09 maribu

Canceled.

bors[bot] avatar Sep 17 '23 18:09 bors[bot]

Probably not worth backporting but I always like fixes, Is it still WIP?

MrKevinWeiss avatar Oct 23 '23 09:10 MrKevinWeiss

To test this PR, I modified the tests/periph/uart test a little bit and added significantly longer delays so a slow human (me) can read off the multimeter in time. This is from the tests/periph/uart/main.c file, lines 203ff:

static void sleep_test(int num, uart_t uart)
{
    printf("UARD_DEV(%i): test uart_poweron() and uart_poweroff()  ->  ", num);
    uart_poweroff(uart);
    //xtimer_usleep(POWEROFF_DELAY);
    xtimer_msleep(4000);
    uart_poweron(uart);
    xtimer_msleep(4000);
    puts("[OK]");
}

When the UART is powered off, the nRF52832 on my nRF52DK consumes 0.474mA and with the UART powered on it consumes 0.530mA. This is measured against Nordics recommendation when the board is powered via USB. They say that powering it via USB can introduce unwanted noise, but for this static test it should be okay. The measurements were taken on the "nRF current measurement" headers, so it does not measure the consumption of the JLink it's blinking LED. The multimeter is an HP 34401A, so it should be precise enough. In the sleep phases, the measurements are very stable as well.

IMO this PR is well worth backporting, especially since most of the changes regarding the register accesses are already done in the mainline code, so this PR should shrink to only a handful of lines changed.

crasbe avatar May 06 '24 12:05 crasbe

OK, rebased. I don't have the hardware at hands, so this is yet untested. The looks reasonable after resolving the merge conflict, though.

maribu avatar May 06 '24 14:05 maribu

I can't add that as a comment, but you can delete the ENABLE_ON and ENABLE_OFF macros from lines 44, 45, 52 and 53. They are now unused.

This is the expansion of the test code in tests/periph/uart:

static void sleep_test(int num, uart_t uart)
{
    printf("UARD_DEV(%i): test uart_poweron() and uart_poweroff()  ->  ", num);
    xtimer_msleep(5);
    uart_poweroff(uart);

    uint8_t data[8] = "asdf";
    uart_write(uart, data, sizeof(data)); // this should not output any data but not hang either

    //xtimer_usleep(POWEROFF_DELAY);
    xtimer_msleep(4000);
    uart_poweron(uart);
    xtimer_msleep(4000);
    puts("[OK]");
}

The additional 5ms sleep is there to make sure the data is sent before the UART goes to sleep. The uart_write function returns as expected and does not result in a deadlock, the current consumption is reduced as expected as well.

This looks good from my side, thank you for addressing this so quickly :)

crasbe avatar May 06 '24 15:05 crasbe

I can't add that as a comment, but you can delete the ENABLE_ON and ENABLE_OFF macros from lines 44, 45, 52 and 53. They are now unused.

Indeed. But I wonder if those were actually more readable than what I did when mergeing :rofl:

maribu avatar May 06 '24 17:05 maribu

I was so free to squash directly and I changed the code to instead make use of ENABLE_ON and the ENABLE_OFF macro, rather than re-inventing the wheel.

maribu avatar May 06 '24 17:05 maribu

I agree that using ENABLE_ON and ENABLE_OFF is more elegant. I can test the latest version tomorrow, but I wouldn't expect any problems.

One more thing that might be worth looking at are Lines 140-143:

#ifndef UARTE_PRESENT
    /* only the legacy non-EasyDMA UART needs to be powered on explicitly */
    dev->POWER = 1;
#endif

The "dev->POWER = 1;" line is present in set_power() as well, so one of the two places is probably redundant.

crasbe avatar May 06 '24 17:05 crasbe

Was the change of the bmp.py file supposed to smuggle into the PR?

crasbe avatar May 06 '24 19:05 crasbe

Thx, good catch!

maribu avatar May 06 '24 19:05 maribu

It appears that this breaks examples/default on the nrf52840dk when UART is used for stdio. Hard-wiring get_power() to return true unconditionally fixes the regression, but causes uart_write() to deadlock when the UART is powered off :cry:

maribu avatar May 06 '24 20:05 maribu

That is odd, stdio_uart does not care about the return value of uart_write and uart_write is the only function that calls get_power. I don't have a nRF52840DK at home, so I'll have to try it out tomorrow.

Could it be that UARTE_PRESENT is not set anywhere in RIOT currently? You introduced it in #20102, but it's not defined anywhere: https://github.com/search?q=repo%3ARIOT-OS%2FRIOT%20UARTE_PRESENT&type=code (or maybe I'm blind).

But if that's the case, the registers for the nRF52 UARTE are not set corretly.

Edit: Sorry that this became such a can of worms, that was certainly not intended 👀

crasbe avatar May 06 '24 20:05 crasbe

The define is in the vendor header files:

cpu/nrf53/include/vendor/nrf5340_application_peripherals.h
212:#define UARTE_PRESENT

cpu/nrf52/include/vendor/nrf52820_peripherals.h
202:#define UARTE_PRESENT

cpu/nrf52/include/vendor/nrf52832_peripherals.h
227:#define UARTE_PRESENT

cpu/nrf52/include/vendor/nrf52840_peripherals.h
237:#define UARTE_PRESENT

cpu/nrf52/include/vendor/nrf52811_peripherals.h
195:#define UARTE_PRESENT

cpu/nrf53/include/vendor/nrf5340_network_peripherals.h
159:#define UARTE_PRESENT

cpu/nrf52/include/vendor/nrf52805_peripherals.h
194:#define UARTE_PRESENT

cpu/nrf52/include/vendor/nrf52833_peripherals.h
239:#define UARTE_PRESENT

cpu/nrf52/include/vendor/nrf52810_peripherals.h
188:#define UARTE_PRESENT

cpu/nrf9160/include/vendor/nrf9160_peripherals.h
147:#define UARTE_PRESENT

But even without UARTE_PRESENT it should work, as the non Easy-DMA variant of the UART peripheral is a subset of the UARTE peripheral. Hence, old code for nRF51 is supposed to just work.

In fact, for the nRF52832 we previously did so, likely because it was just overlooked that this MCU does have the UARTE.

The crash I have is in the nrf802154 thread early on. Disabling the network device also fixes the crash. It looks pretty much like a memory corruption issue. I have the feeling that the real bug is present in master already and I just happen to be the lucky one to trigger it.

maribu avatar May 06 '24 20:05 maribu

I don't understand why the Github search does not find these files, but hey, it would've been too easy I guess.

Maybe it is just stack related? When get_power is forced to return true, the compiler likely optimizes it away, resulting in one less function on the stack. The Makefile does not increase the stack size, but a lot of stuff is going on, so this is not unlikely. I had to increase the stack size for a some examples (the filesystem example for example) while tinkering with the nRF52 and not in every case the typical memory runout message appeared.

crasbe avatar May 06 '24 20:05 crasbe

A stack overflow should actually be detected by the MPU stack guard. Unless of course that happened too early on.

maribu avatar May 06 '24 20:05 maribu

According to the UFSR it is an INVSTATE fault some time between event_queues_init() and msg_init_queue() in gnrc_netif.c:1946 and the statement below.

Interestingly: If step with stepi, the hard fault is not triggered and the app works fine. Disabling IRQs during that call to event_queues_init() also fixes the issue.

I wonder if there is a race condition with the radio sending events to the event queue, but the event queue not yet fully set up.

I'll take another look tomorrow. I'm too tired now to investigate.

maribu avatar May 06 '24 21:05 maribu

It probably won't help if I say "works for me" :eyes: The result is the same without the USEMODULE and it works with "BOARD=nrf52840dk make flash term" as well.

chris@ThinkPias:~/flashdev-riot/RIOT/examples/default$ USEMODULE+=stdio_uart BOARD=nrf52840dk make hexfile -j12
"make" -C /home/chris/flashdev-riot/RIOT/pkg/cmsis/ 
.......
chris@ThinkPias:~/flashdev-riot/RIOT/examples/default$ nrfjprog --recover && nrfjprog --chiperase --verify --program bin/nrf52840dk/default.hex && nrfjprog --pinresetenable && nrfjprog --pinreset && BOARD=nrf52dk make term
Recovering device. This operation might take 30s.
Erasing user code and UICR flash areas.
Writing image to disable ap protect.
[ #################### ]   0.205s | Erase file - Done erasing                                                          
[ #################### ]   0.303s | Program file - Done programming                                                    
[ #################### ]   0.309s | Verify file - Done verifying                                                       
Enabling pin reset.
Applying pin reset.
/home/chris/flashdev-riot/RIOT/dist/tools/pyterm/pyterm -p "/dev/ttyACM0" -b "115200"  
Twisted not available, please install it if you want to use pyterm's JSON capabilities
2024-05-07 11:06:35,079 # Connect to serial port /dev/ttyACM0
Welcome to pyterm!
Type '/exit' to exit.
2024-05-07 11:06:36,085 # NETOPT_TX_END_IRQ not implemented by driver
2024-05-07 11:06:36,086 # main(): This is RIOT! (Version: 2024.07-devel-141-gde99c6-cpu/nrf5x_common/periph_uart)
2024-05-07 11:06:36,087 # Welcome to RIOT!
> reboot
2024-05-07 11:06:44,520 # reboot
2024-05-07 11:06:44,525 # NETOPT_TX_END_IRQ not implemented by driver
2024-05-07 11:06:44,533 # main(): This is RIOT! (Version: 2024.07-devel-141-gde99c6-cpu/nrf5x_common/periph_uart)
2024-05-07 11:06:44,534 # Welcome to RIOT!
> 

I can't really say if the board is actually sending something, but it looks convincing:

2024-05-07 11:10:14,069 # NETOPT_TX_END_IRQ not implemented by driver
2024-05-07 11:10:14,070 # main(): This is RIOT! (Version: 2024.07-devel-141-gde99c6-cpu/nrf5x_common/periph_uart)
2024-05-07 11:10:14,070 # Welcome to RIOT!
> ps
2024-05-07 11:10:18,325 # ps
2024-05-07 11:10:18,334 # 	pid | name                 | state    Q | pri | stack  ( used) ( free) | base addr  | current     
2024-05-07 11:10:18,343 # 	  - | isr_stack            | -        - |   - |    512 (  164) (  348) | 0x20000000 | 0x200001c8
2024-05-07 11:10:18,352 # 	  1 | main                 | running  Q |   7 |   1536 (  692) (  844) | 0x200002b0 | 0x20000794 
2024-05-07 11:10:18,360 # 	  2 | pktdump              | bl rx    _ |   6 |   1472 (  176) ( 1296) | 0x2000115c | 0x2000166c 
2024-05-07 11:10:18,369 # 	  3 | nrf802154            | bl anyfl _ |   2 |    896 (  308) (  588) | 0x20000a88 | 0x20000d4c 
2024-05-07 11:10:18,375 # 	    | SUM                  |            |     |   4416 ( 1340) ( 3076)
> ifconfig
2024-05-07 11:10:29,875 # ifconfig
2024-05-07 11:10:29,881 # Iface  3  HWaddr: 75:74  Channel: 26  NID: 0x23  PHY: O-QPSK 
2024-05-07 11:10:29,885 #           Long HWaddr: 5E:07:8F:1E:EE:26:F5:74 
2024-05-07 11:10:29,887 #            State: IDLE 
2024-05-07 11:10:29,892 #           ACK_REQ  L2-PDU:102  Source address length: 2
2024-05-07 11:10:29,895 #           Link type: wireless
2024-05-07 11:10:29,896 #           
> txtsnd 3 bcast "asdf"
2024-05-07 11:10:37,823 # txtsnd 3 bcast "asdf"

For some reason though the ble subsystem is not added for the nRF52840DK and I did not figure out why. This is what the log looks like for the nRF52DK:

2024-05-07 11:24:45,133 # main(): This is RIOT! (Version: 2024.07-devel-141-gde99c6-cpu/nrf5x_common/periph_uart)
2024-05-07 11:24:45,134 # Welcome to RIOT!
>help
2024-05-07 11:24:52,558 # help
2024-05-07 11:24:52,561 # Command              Description
2024-05-07 11:24:52,565 # ---------------------------------------
2024-05-07 11:24:52,569 # ble                  Manage BLE connections for NimBLE
2024-05-07 11:24:52,573 # ifconfig             Configure network interfaces
2024-05-07 11:24:52,578 # pm                   interact with layered PM subsystem
2024-05-07 11:24:52,583 # ps                   Prints information about running threads.
2024-05-07 11:24:52,587 # reboot               Reboot the node
2024-05-07 11:24:52,592 # saul                 interact with sensors and actuators using SAUL
2024-05-07 11:24:52,599 # txtsnd               Sends a custom string as is over the link layer
2024-05-07 11:24:52,603 # version              Prints current RIOT_VERSION

But neither of these things have anything to do with what you're seeing I guess? The nRF52DK does not seem to receive anything I send with the nRF52840DK, but that might be because I don't know what I'm doing with the Default example :sweat_smile:

crasbe avatar May 07 '24 09:05 crasbe

But neither of these things have anything to do with what you're seeing I guess?

No. I also fail to reproduce the issue on a different nrf52840-dk (one that has "nRF52840-Preview-DK" on the silkscreen where the affected board only has "nRF52840-DK"). Maybe I should check the errata.

maribu avatar May 08 '24 11:05 maribu

The board I have here is a PCA10056 Revision 3.0.1 from 2023.6. The nRF52840 has the following text written on it:

N52840
QIAAF0
2232AQ

QIAAF0 means that the chip is Revision 3. We have some chips here with Revision CKAAD0, which is Revision 2. But they are not on a Devboard, but integrated in another project.

crasbe avatar May 08 '24 11:05 crasbe

OK, here is what I have found so far:

  1. The issue will only trigger on some hardware, but reliably so there. It may apply to all nRF52840 MCUs of that revision, or be "silicon lottery" - I don't know
  2. It requires GCC 13.2.0 and newlib 4.3.020230120 and will only trigger on examples/default. Specifically, even the same version of newlib rebuild will no longer trigger it, but specifically the revision 3 of that package in Alpine is affected
  3. It is timing sensitive. Adding a few nops will "fix" the issue
  4. It is triggered within the puts() implementation of newlib during LOG_WARNING("NETOPT_TX_END_IRQ not implemented by driver\n")
  5. In the hard fault handler the program counter that triggered the fault points somewhere into __sf, which is __FILE __sf[3]; - so it is data not machine code

I'm almost certain it is an issue with RIOT not implementing the locks newlib requires to correctly implement reentrant functions, but only protects malloc() and friends from data races. I guess that this needs to be addressed now.

In any case, I'm 99.999% certain that the issues are unrelated to this PR.

maribu avatar May 09 '24 10:05 maribu