STM32CubeWL icon indicating copy to clipboard operation
STM32CubeWL copied to clipboard

FLASH_IF_INT_CLEAR_ERROR causes to hard fault

Open fozayDost opened this issue 1 year ago • 8 comments

Hello guys,

During flash update operation, i sometimes got hard fault. Here is some information :

Board : Custom board with STM32WLE5JCIX based LoRa-E5 (not module). IDE : STM32CubeIDE

I am using Lorawan, GNSS, power modes (stop mode) and TIMER_IF functions to wake up periodically to read accelerometer.

Error production steps : 1- Board is low power state 2- Send downlink signal to board in order to trigger flash update. 3- Board wake up for sending data over LoRa (TX) 4- After sending data board receiving data over LoRa (RX) 5- Board got my message and try to update the flash with the procedure below :

a- FLASH_IF_Erase to erase the flash b- FLASH_IF_WRİTE to write the flash c- Reset the board if necessary

` struct flashVariables flashWrite;

if (FLASH_IF_Erase((void *)&__CONFIG_START, USER_FLASH_PAGE_SIZE) == FLASH_IF_OK)
     {

	flashWrite.opModeCurVar = opModeCur;
	flashWrite.tasmaIdVar = tasmaId;
	flashWrite.zThreshVar = zThreshold;
	flashWrite.zUptimeThreshVar = zUptimeThreshold;
	flashWrite.flashCheck = 0x55;
	FLASH_IF_Write((void *)&__CONFIG_START, (const void *)&flashWrite, sizeof(struct flashVariables));
}

if (flashUpdateTriggered >= 2)
{
	HAL_Delay(1000);
	NVIC_SystemReset();
}

flashUpdateTriggered = 0;`

__CONFIG_START is defined in .ld file as :

MEMORY { RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 64K RAM2 (xrw) : ORIGIN = 0x10000000, LENGTH = 32K FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 254K FCONF(r) : ORIGIN = 0x0803F800, LENGTH = 2K }

and user page size is : #define USER_FLASH_PAGE_SIZE 0x800

By the way, it is working without problem most of the times. I got the error maybe 1/20 trial. I have tested it in a loop like erase-write-read and i could not cause the error.

This is the error stack, HAL_FLASH_LOCK causes the hard fault especially READ_BIT line.

/* verify Flash is locked */ if (READ_BIT(FLASH->CR, FLASH_CR_LOCK) == 0U) { status = HAL_ERROR; }

FlashError

FlashError2

FlashError3

I could not find a way to handle the problem anymore. Could someone help me with this.

fozayDost avatar Sep 21 '23 09:09 fozayDost

Hello @fozayDost,

Thank you for this report. We will get back to you as soon as we analyze it further. This may take some time. Thank you for your comprehension.

With regards,

TOUNSTM avatar Oct 03 '23 09:10 TOUNSTM

Hi,

From the registers from the flash peripheral you've posted I've seen that the bit CFGBSY from SR register is set.

Some time ago I've had a similar problem (busfault escalated to hardfault with this bit set).

This may help you somehow, if it is still a problem after all the time this issue is opened.

ebanorafael avatar Nov 27 '23 19:11 ebanorafael

Hello @fozayDost,

I apologize for the delayed response. Is your problem still persisting?

Kind regards,

TOUNSTM avatar Jul 30 '24 10:07 TOUNSTM

We are facing same issue; device goes to hard fault while trying to store LoRaWAN context. From the call stack it is clear that FLASH_IF_INT_CLEAR_ERROR function is the last function. Issue is random and doesn't happen every time and not possible to catch with debugging enabled.

pd-vt avatar Aug 12 '24 06:08 pd-vt

@TOUNSTM We have identified that the Root cause is FLASH->SR Register's CFGBSY bit is set and the bit is setting up if there is any write attempt is made at the Pointer variable which contains the address in the range of ROM space. However, code is really big and it is nearly impossible to find what is causing this.

Is there any debugging feature in STM32 CubeIDE which can let us identify the scenario which is causing this error?

FYI when we add a watchpoint to address 0x08000000 for any write operation and then we try writing to variable like below

uint32_t pNull = (uint32_t)0x08000000; *pNull = 0xFFFFFFFF;

We see the code breaks at this point but this can happen at any address and we are not able to find the place which has this bug. Please help.

pd-vt avatar Aug 13 '24 12:08 pd-vt

@TOUNSTM We have identified that the Root cause is FLASH->SR Register's CFGBSY bit is set and the bit is setting up if there is any write attempt is made at the Pointer variable which contains the address in the range of ROM space. However, code is really big and it is nearly impossible to find what is causing this.

Is there any debugging feature in STM32 CubeIDE which can let us identify the scenario which is causing this error?

FYI when we add a watchpoint to address 0x08000000 for any write operation and then we try writing to variable like below

uint32_t pNull = (uint32_t)0x08000000; *pNull = 0xFFFFFFFF;

We see the code breaks at this point but this can happen at any address and we are not able to find the place which has this bug. Please help.

  • 0x08000000 is the program start address, this operation is illegal, right?

wdfk-prog avatar Aug 14 '24 02:08 wdfk-prog

Hello @fozayDost,

Thank you for this report. To address the intermittent hard fault issue you're experiencing during the flash update process on your STM32WLE5JCIX-based custom board, let's break down the problem and potential solutions step-by-step.

Problem Analysis

  1. Intermittent Hard Fault: The hard fault occurs during the HAL_FLASH_LOCK function, specifically at the READ_BIT(FLASH->CR, FLASH_CR_LOCK) line.
  2. CFGBSY Bit Set: The CFGBSY bit in the SR register indicates that the flash memory is busy with a configuration operation.

Potential Causes

  1. Flash Busy State: The flash memory might still be busy with a previous operation when you attempt to lock it.
  2. Interrupts: An interrupt might be occurring during the flash operation, causing the flash to be accessed while it is still busy.
  3. Flash Lock State: The flash might not be properly locked or unlocked, leading to inconsistent states.
  4. Power Modes: The transition between low power modes and active modes might not be handled properly, leading to timing issues.

Solutions

  1. Ensure Flash is Not Busy: Before performing any flash operation, ensure that the flash is not busy. You can add a check to wait until the CFGBSY bit is cleared.
while (READ_BIT(FLASH->SR, FLASH_SR_CFGBSY) != 0U) {
    // Optionally add a timeout mechanism to avoid an infinite loop
}
  1. Disable Interrupts during Flash operations: Disable interrupts before starting the flash operation and re-enable them afterward to prevent any interruptions during the critical section.
__disable_irq();
if (FLASH_IF_Erase((void *)&__CONFIG_START, USER_FLASH_PAGE_SIZE) == FLASH_IF_OK) {
    flashWrite.opModeCurVar = opModeCur;
    flashWrite.tasmaIdVar = tasmaId;
    flashWrite.zThreshVar = zThreshold;
    flashWrite.zUptimeThreshVar = zUptimeThreshold;
    flashWrite.flashCheck = 0x55;
    FLASH_IF_Write((void *)&__CONFIG_START, (const void *)&flashWrite, sizeof(struct flashVariables));
}
__enable_irq();
  1. Verify Flash Lock State: Ensure that the flash is properly locked and unlocked before and after operations. This can be done by explicitly locking and unlocking the flash.
// Unlock the flash
HAL_FLASH_Unlock();

// Perform flash operations

// Lock the flash
HAL_FLASH_Lock();
  1. Add Delay Before Reset: Ensure that there is enough delay before resetting the board to allow the flash operation to complete.
if (flashUpdateTriggered >= 2) {
    HAL_Delay(1000); // Ensure this delay is sufficient
    NVIC_SystemReset();
}

By ensuring that the flash is not busy, disabling interrupts during flash operations, verifying the flash lock state and adding sufficient delays, you can mitigate the risk of hard faults during flash updates. If the problem persists, please provide more details.

With regards,

TOUNSTM avatar Aug 14 '24 14:08 TOUNSTM

Hello @pd-vt & @wdfk-prog,

Thank you for your contributions, Indeed, writing to the address 0x08000000 is illegal because it is typically the start address of the Flash memory where the program code is stored. Writing to this address can cause undefined behavior and is not allowed.

Best Regards,

TOUNSTM avatar Aug 14 '24 15:08 TOUNSTM