STM32CubeWB
STM32CubeWB copied to clipboard
CPU2 crash report during SHCI_C2_BLE_Init(...)
This is a cross post from the community forum, but I haven't gotten a response there to my bug report yet and since this is crashing within the CPU2 side of things an actual bug report here makes sense. It's incredibly difficult to debug the CPU2 side since the code is delivered as an encrypted blob.
Describe the set-up
- Nucleo-WB55 or our custom board utilizing a STM32WB55
- gcc-arm-none-eabi-10-2020-q4-major
Describe the bug
Upon invoking SHCI_C2_BLE_Init(...) CPU2 enters a hard fault within the BLE HCI stacks
How To Reproduce
I'm not quite sure what it is from our codebase that causes this yet. I could maybe provide a reduced pre-compiled binary to ST, but I cannot provide the full source code from our proprietary project. It's 100% reproducible with our code running on CPU1, and from what I can tell, the HSEM/IPCC/RCC peripherals are all in the same state as the example projects so I'm currently at a loss as to why this occurs. I've also copied all of the SHCI_C2_BLE_Init(...) parameters from some of the newer examples to no avail and validated with gdb that I had the exact same buffer contents being sent to CPU2 in the shared memory through the mailbox mechanism as the transparent mode example codebase.
Within our codebase, I can run v1.11.x through v1.15.0 of the BLE stack and successfully scan for BLE packets. v1.16.x through v1.19.x report a "security attack" upon SHCI_C2_BLE_Init(...) invocation, and v1.20.x has a hard fault. These variations in BLE stack behavior are all without changing the CPU1 firmware.
Here are the hard fault codes of the various v1.20.x HCI stacks as soon as I invoke SHCI_C2_BLE_Init(...) in our codebase:
v1.20.0 of stm32wb5x_BLE_HCILayer_fw.bin has a hard fault:
0x20030000 <TL_RefTable>: 0x1170fd0f 0x00003284 0x00002a33 0x2003f198
v1.20.0 of stm32wb5x_BLE_HCI_AdvScan_fw.bin has a hard fault:
0x20030000 <TL_RefTable>: 0x1170fd0f 0x00003160 0x00001f6f 0x2003ef50
v1.20.0 of stm32wb5x_BLE_HCILayer_extended_fw.bin has a hard fault:
0x20030000 <TL_RefTable>: 0x1170fd0f 0x00003390 0x00002b3f 0x2003f6f8
Please let me know if the PC, SP, and LR inside CPU2 works is sufficient to get an initial fault analysis, or if I need to prepare a minimal pre-compiled binary for CPU1 exhibiting this. I'm still attempting to narrow down what's different between the examples and our codebase - we had integrated the BLE portions of the v1.11.x STM32WBCube codebase quite a while ago but are trying to migrate to the newer version of the BLE stack so we can address the errata around necessitating calling the relatively new SHCI_C2_SetSystemClock(...) command.
ST Internal Reference: 189569
I spent some time yesterday/today and figured out how to dump out the code for CPU2 for these encrypted firmware binaries so that I could properly debug this.
This is what I found:
-
v1.16.0:
-
This firmware started to always validate
p_ble_table->phci_acl_data_bufferas pointing to non-secure SRAM. Previously, this validation was dependent on ifSHCI_C2_Ble_Init_Cmd_Param_t'sOptionsbyte had bit 0 set to 1. I had this bit set to 0 in our code, so the value ofp_ble_table->phci_acl_data_bufferwas ignored prior to v1.16.0.This is sort of mentioned in the release notes of v1.16.0 as:
ID 136949 : For ACL_DATA activation, the BLE options flag has to be configured with SHCI_C2_BLE_INIT_OPTIONS_LL_ONLY with Full and Full extended stack binaries and no special BLE options flag required in “HCI_ONLY” (ie Light, HCI layer, ext HCI layer binaries)
but the way it's read makes you think it's required only if you want to use ACL_DATA, and it doesn't mention that this is required with the BLE_HCI_AdvScan_fw now too.
-
(The validation in v1.15.0 of stm32wb5x_BLE_HCI_AdvScan_fw.bin was skipped at instruction offset 0xca8 and jumped to 0xcb8, which skipped the call to the validation function for this parameter.)
-
-
v1.20.0:
- This firmware invokes a software breakpoint when it has a security fault just after the security attack key is set into SRAM2A_BASE. The software breakpoint in turn caused a hard fault, which ultimately resulted in SRAM2A_BASE being set twice in a row (once with a security fault, and then once with a hard fault).
I see 2 bugs as a result:
-
Hard fault instead of security error inside the CPU2 firmware in v1.20.0
-
The STM32CubeWB's
hci_init(...)call does not initializephci_acl_data_buffer, and instead, whatever value that was on the stack at the time of invocation ends up in this pointer. This leads to semi-unpredictable behavior in the initialization routine CPU2 side given the change in v1.16.0 since it's validating variables that inherit whatever was on the stack at the time:void hci_register_io_bus(tHciIO* fops) { /* Register IO bus services */ fops->Init = TL_BLE_Init; fops->Send = TL_BLE_SendCmd; return; } void hci_init(void(* UserEvtRx)(void* pData), void* pConf) { StatusNotCallBackFunction = ((HCI_TL_HciInitConf_t *)pConf)->StatusNotCallBack; hciContext.UserEvtRx = UserEvtRx; hci_register_io_bus (&hciContext.io); TlInit((TL_CmdPacket_t *)(((HCI_TL_HciInitConf_t *)pConf)->p_cmdbuffer)); return; } typedef struct { void (* IoBusEvtCallBack) ( TL_EvtPacket_t *phcievt ); void (* IoBusAclDataTxAck) ( void ); uint8_t *p_cmdbuffer; uint8_t *p_AclDataBuffer; } TL_BLE_InitConf_t; static void TlInit( TL_CmdPacket_t * p_cmdbuffer ) { TL_BLE_InitConf_t Conf; ... /* Initialize low level driver */ if (hciContext.io.Init) { Conf.p_cmdbuffer = (uint8_t *)p_cmdbuffer; /* Several values in Conf are left uninitialized, including phci_acl_data_buffer */ Conf.IoBusEvtCallBack = TlEvtReceived; hciContext.io.Init(&Conf); } return; }I'd suggest at least initializing these fields to 0. This could simply by done by changing
TL_BLE_InitConf_t Conf->TL_BLE_InitConf_t Conf = {}.The wireless application manual states this of
TL_BLE_Init(...), indicating that the current behavior of hci_init(...) using the stack values to fill in the remaining values is not adhering to the spec given:When not in HCI only mode, both p_AclDataBuffer and IoBusAclDataTxAck are not used and must be set to 0.
So follow up questions beyond the 3 bugs noted above:
- Is it expected that the validation changed for
phci_acl_data_buffer? - Is it expected to be able to utilize the
hci_*APIs fromble_hci_le.hto interface with the HCI firmware stack variants, especially the beacon/scan only variant? I don't need passthrough, and I don't need BLE ACL support since I'm only doing BLE advertising packet scans, so I'd have to allocate a completely unused buffer for doing this with v1.16.x and up given what I discovered here. - Does the stm32wb5x_BLE_HCI_AdvScan_fw.bin variant even support HCI ACL packets, especially since HCI_LE_CREATE_CONNECTION isn't supported in this stack? Given this, I wouldn't expect this variant of the stack to require
phci_aci_data_bufferas being allocated.
I got some follow-up answers within the community forum that:
- The stm32wb5x_BLE_HCI_AdvScan_fw.bin variant does not support HCI ACL packets, but requires the ACL buffer allocation as the firmware checks "is it a HCI variant?".
- The validation change for
phci_cal_data_bufferis expected.
Hello,
Point 1. Issue has been fixed for STM32CubeWB v1.21.0 Point 2. Behavior expected and explained to the customer. check this link please
Regards,