STM32CubeL4
STM32CubeL4 copied to clipboard
Bug in UART code
Describe the set-up
- Custom board with a STM32L431.
- STM32CubeIDE 1.13.1.
Describe the bug
In the following code in stm32l4xx_hal_uart.c, function HAL_UART_Transmit(), the huart->TxXferCount counter underflows, and since it is an unsigned value, the loop increments pdata8bits way more than it should.
while (huart->TxXferCount > 0U) {
if (UART_WaitOnFlagUntilTimeout(huart, UART_FLAG_TXE, RESET, tickstart, Timeout) != HAL_OK) {
huart->gState = HAL_UART_STATE_READY;
return HAL_TIMEOUT;
}
if (pdata8bits == NULL) {
huart->Instance->TDR = (uint16_t)(*pdata16bits & 0x01FFU);
pdata16bits++;
} else {
huart->Instance->TDR = (uint8_t)(*pdata8bits & 0xFFU);
pdata8bits++;
}
huart->TxXferCount--;
}
I know, from the code this should be impossible to happen, but... interrupts!
How To Reproduce
-
In my case, there should be a way to generate
HAL_UART_ERROR_NE(Noise Error) while receiving in DMA mode and a blocking transmission is in progress. -
UART is the responsible module.
-
A call to
HAL_UART_Transmit()generates the problem. -
It is non-deterministic, since it is related to the time that an interruption happens due to an external event.
-
The code eventually halts hitting the global Error_Handler() (invalid memory address accessed via
pdata8bits);
Additional context
- Enable UART DMA reception.
- Call
HAL_UART_Transmit()to transmit a buffer. Easier to trigger the problem if the buffer is large. - Get some bad wires and use one to generate some noise in the RX line.
- If the
HAL_UART_ERROR_NEinterruption is triggered,UART_DMAError()will be called viahuart->hdmarx->XferErrorCallback. This will sethuart->TxXferCountto zero. - At the end of the loop,
huart->TxXferCountis decremented, effectively assinging it a large value, since it is unsigned.
Nasty hacky workaround
Since HAL_UART_ErrorCallback() is called in this process, set huart->TxXferCount to one in that callback. That way huart->TxXferCount will become zero when decremented, and the transfer will be aborted properly. Worst case scenario, it will run one extra cycle of transmission if the interrupt occurs after decrementing and before testing the exit of the loop.
Proper fix
That is a good question. Maybe introduce a huart->TxAborted flag and test it in the TX loop, since setting huart->TxXferCount to zero is prone to races.
Hello, Would you please share the whole project you have used to reproduce the issue in order to allow a better analysis of the problem.
With regards,
Hello, Would you please share the whole project you have used to reproduce the issue in order to allow a better analysis of the problem.
With regards,
I'm affraid this is not possible. The problem is well explained, but I can provide any extra details you want.
The variable huart->TxXferCount is beeing changed to zero outside this loop in an interruption. The condition that causes the interruption is HAL_UART_ERROR_NE due to noise in the communication line. This causes the loop to run for a long time transmitting characters that it was not supposed to transmit and eventually halting due to accessing invalid memory.
What else do you need to know?
Regards, Marcelo.
Got the same issues with DMA reception while transferring in blocking mode HAL_UART_Transmit().
I got it with HAL_UART_ERROR_RTO (receiver timeout) that i use to receive variable length frames.
My workaround is to disable UART interrupts before the transmission with HAL_NVIC_DisableIRQ(USART1_IRQn) and HAL_NVIC_EnableIRQ(USART1_IRQn)...
Regards, Alexander.
Hello All,
I don't think that the huart->TxXferCount will be reset to 0, according to your scenario. Typically, if an error occurs while receiving with DMA, the huart->RxXferCount would be reset to 0, as shown in the following snippet of code.
/* Stop UART DMA Rx request if ongoing */
if ((HAL_IS_BIT_SET(huart->Instance->CR3, USART_CR3_DMAR)) &&
(rxstate == HAL_UART_STATE_BUSY_RX))
{
huart->RxXferCount = 0U;
UART_EndRxTransfer(huart);
}
@Carm66, could you please provide additional information, such as screenshots when the error occurs.
I didn't get the same issue even with HAL_UART_ERROR_RTO while testing with a Nucleo board.
With regards,
Hello All,
I don't think that the
huart->TxXferCountwill be reset to 0, according to your scenario. Typically, if an error occurs while receiving with DMA, thehuart->RxXferCountwould be reset to 0, as shown in the following snippet of code./* Stop UART DMA Rx request if ongoing */ if ((HAL_IS_BIT_SET(huart->Instance->CR3, USART_CR3_DMAR)) && (rxstate == HAL_UART_STATE_BUSY_RX)) { huart->RxXferCount = 0U; UART_EndRxTransfer(huart); }
Notice that the issue I have posted has been the result of a step by step debugging.
Repeating step 4 of my post here:
If the `HAL_UART_ERROR_NE` interruption is triggered, `UART_DMAError()` will be called via
`huart->hdmarx->XferErrorCallback`. This will set `huart->TxXferCount` to zero.
The following code is in stm32l4xx_hal_uart.c, line 3964, function UART_DMAError():
/**
* @brief DMA UART communication error callback.
* @param hdma DMA handle.
* @retval None
*/
static void UART_DMAError(DMA_HandleTypeDef *hdma)
{
UART_HandleTypeDef *huart = (UART_HandleTypeDef *)(hdma->Parent);
const HAL_UART_StateTypeDef gstate = huart->gState;
const HAL_UART_StateTypeDef rxstate = huart->RxState;
/* Stop UART DMA Tx request if ongoing */
if ((HAL_IS_BIT_SET(huart->Instance->CR3, USART_CR3_DMAT)) &&
(gstate == HAL_UART_STATE_BUSY_TX))
{
huart->TxXferCount = 0U;
UART_EndTxTransfer(huart);
}
/* Stop UART DMA Rx request if ongoing */
if ((HAL_IS_BIT_SET(huart->Instance->CR3, USART_CR3_DMAR)) &&
(rxstate == HAL_UART_STATE_BUSY_RX))
{
huart->RxXferCount = 0U;
UART_EndRxTransfer(huart);
}
huart->ErrorCode |= HAL_UART_ERROR_DMA;
#if (USE_HAL_UART_REGISTER_CALLBACKS == 1)
/*Call registered error callback*/
huart->ErrorCallback(huart);
#else
/*Call legacy weak error callback*/
HAL_UART_ErrorCallback(huart);
#endif /* USE_HAL_UART_REGISTER_CALLBACKS */
}
So, this is the result of a step by step debugging session, and is what actually happens, huart->TxXferCount is reset to zero in the middle of the transmission because of a noise error interruption due to noise in the reception line.
Regards, Marcelo.
Lets focus on that part of the code:
/* Stop UART DMA Tx request if ongoing */
if ((HAL_IS_BIT_SET(huart->Instance->CR3, USART_CR3_DMAT)) &&
(gstate == HAL_UART_STATE_BUSY_TX))
{
huart->TxXferCount = 0U;
UART_EndTxTransfer(huart);
}
Notice that there is a transmission going on, but it is not on DMA, it is a normal, non-DMA, non-Interrupt transmission. I agree that the intention of the code was to abort only DMA, but is there a chance that this test is not working as intended?
But really, the first thing is to be able to reproduce the problem, otherwise it will be very difficult for you to deal with it.
So, repeating my previous post:
Additional context
- Enable UART DMA reception.
- Call
HAL_UART_Transmit()to transmit a buffer. Easier to trigger the problem if the buffer is large. - Get some bad wires and use one to generate some noise in the RX line.
- If the
HAL_UART_ERROR_NEinterruption is triggered,UART_DMAError()will be called viahuart->hdmarx->XferErrorCallback. This will sethuart->TxXferCountto zero. - At the end of the loop,
huart->TxXferCountis decremented, effectively assinging it a large value, since it is unsigned.
A bad wire is just a wire connected to the RX pin. Keep touching it with your fingers while normal TX is going on.
I know, from the code this should be impossible to happen, but... interrupts!
Looks like the author has skipped some concurrency 101 classes. I wonder if there could be more unintended concurrency issues that have not been thought about 😬
Hello @mrjimenez,
I have checked with our Dev Team, and as I said, the UART_DMAError() is called on a DMA error, and if we suppose that the error noise will trigger this API. There is no reason for Tx counter to be reset to zero while a problem was in HAL_UART_Receive_DMA().
Also, I didn't manage to reproduce it with our board, I tried many times, we do not have the same conditions. So, if you may, can you run your project in debugging mode and share more screenshots (that can help us to analyze more the issue and if necessary, share it with our team) such as:
- Where the program is stopped when the problem occurs.
- the "error code".
- the content of the structures UART_HandleTypeDef and DMA_HandleTypeDef.
With regards,
Hi @KRASTM ,
Ok, I understand. I will do another debug round for this as soon as I can.
But I can certainly answer some of your questions:
- Where the program is stopped when the problem occurs: The main program is looping in HAL_UART_Transmit(), in the loop I have posted before, that is controlled by huart->TxXferCount, which has a huge value. The microprocessor is stopped at the error_handler due to an invalid memory access. The invalid memory is my transmit buffer added to the huge index huart->TxXferCount.
- There is no error code, since the routine thinks it is still transmitting something and the exception occurs.
- These I can send you later when I can debug that again. I will also try to do a minimal project sample to reproduce.
Regards, Marcelo.
Hello @KRASTM and @mrjimenez,
I managed to make a tiny project on NUCLEO-L496ZG that reproduce the problem with huart->TxXferCount.
Here is how it works:
- you need to send some data to the mcu (in my example
STon UART2 -> PD6) every 50 ms. - there is a reception with DMA and RTO enabled (receiver timeout) that will catch the data.
- A frame is sent by the mcu every 100 ms (in my example
_HELLO_on UART2 -> PD5) in blocking mode.
If HAL_UART_ErrorCallback is called while a transmission is ongoing, huart->TxXferCount will be set to 0 and there is a chance that in HAL_UART_Transmit it will be decremented (so be set to 65535).
I have added in HAL_UART_Transmit this code that will turn on the red led in this case:
while (huart->TxXferCount > 0U)
{
// ADDED CODE
if(huart->TxXferCount > 9)
{ // Normally impossible case
HAL_GPIO_WritePin(GPIOB, GPIO_PIN_14, GPIO_PIN_SET); // LD3 (red)
}
// ADDED CODE
With this project i can reproduce the problem in ~5 to 10 seconds with and without debug.
To send the data, i made a tiny powershell script called "serial_send.ps1" that you have to call with COM number (ex: .\serial_send.ps1 9 for the COM9) but you can use your own tools.
Regards, Alexander.
ST Internal Reference: 182746
I can confirm that wrapping HAL_UART_Transmit with HAL_NVIC_DisableIRQ(USART1_IRQn); and HAL_NVIC_EnableIRQ(USART1_IRQn); solves the issue for me. Thanks @Carm66 for the solution and @mrjimenez for posting the bug.
Fixed in commit https://github.com/STMicroelectronics/STM32CubeL4/commit/464b08aabe6d5433b6108cc3e77a3a2f84cbba63