stm32h7xx_hal_driver icon indicating copy to clipboard operation
stm32h7xx_hal_driver copied to clipboard

Upgrading the hal library from 1.11.0 to 1.11.1 will cause the SPI master transmission to time out for no reason.

Open huoxingdawang opened this issue 5 months ago • 9 comments

In our project, we need to read a W25Q64 flash using SPI bus. Our code used to work very well, until recently my colleague reported that there were many problems with the flash operation. After investigation, we found that HAL_SPI_Transmit and HAL_SPI_Receive often reported HAL_TIMEOUT for no reason. If we downgrade our hal_spi.c and hal_spi.h to v1.11.0, there will be no problem. If we replace these two files with v1.11.1 and v1.11.2, there will be a timeout problem.

We noticed that the SPI of v1.11.0 is very different from the SPI of v1.11.1. We currently don’t know what specifically caused this phenomenon. Our team has spent two weeks troubleshooting this strange problem, which has had a great impact on our development progress.

huoxingdawang avatar Feb 23 '24 10:02 huoxingdawang

In more detail, our code first need to write num bytes from val to spi, and then read num2 bytes from spi to val2. Our latest debugging results show that there will be problems if we use the following code.

HAL_SPI_Transmit(&hspi1,val,num,2000);
HAL_SPI_Receive(&hspi1,val2,num2,2000);

And the following way it won't.

HAL_SPI_Transmit(&hspi1,val,num,2000);
HAL_SPI_TransmitReceive(&hspi1,val2,val2,num2,2000);

We noticed that in the 1.11.0 version of the hal library, under our configuration (full duplex master) HAL_SPI_Receive was actually calling HAL_SPI_TransmitReceive. https://github.com/STMicroelectronics/stm32h7xx_hal_driver/blob/8e7eb8dba8bee5e4c260c6fc724437502e98fc76/Src/stm32h7xx_hal_spi.c#L1013C1-L1030C1

  if ((hspi->Init.Mode == SPI_MODE_MASTER) && (hspi->Init.Direction == SPI_DIRECTION_2LINES))
  {
    hspi->State = HAL_SPI_STATE_BUSY_RX;
    /* Call transmit-receive function to send Dummy data on Tx line and generate clock on CLK line */
    return HAL_SPI_TransmitReceive(hspi, pData, pData, Size, Timeout);
  }

While in 1.11.1 this implementation was removed. We'd like to know what the motivation was for removing this and how to fix it, like a PR or something.

huoxingdawang avatar Feb 23 '24 13:02 huoxingdawang

To add, we have tried modifying the compiler's optimization level and reducing the SPI clock frequency, and we found that this does not affect this problem.

huoxingdawang avatar Feb 24 '24 04:02 huoxingdawang

Hello @huoxingdawang,

Thank you for your report.

In fact, there was an update on the driver in the version V1.11.1 following a new Hardware implementation. So, under the config (full duplex Master) only the RX Line is active for HAL_SPI_Receiv() function, if you want to use the TX Line during Receiving, you must use HAL_SPI_TransmitReceive() function.

I hope the answer will be helpful.

With regards

KRASTM avatar Feb 28 '24 11:02 KRASTM

Hello @huoxingdawang,

Thank you for your report.

In fact, there was an update on the driver in the version V1.11.1 following a new Hardware implementation. So, under the config (full duplex Master) only the RX Line is active for HAL_SPI_Receiv() function, if you want to use the TX Line during Receiving, you must use HAL_SPI_TransmitReceive() function.

I hope the answer will be helpful.

With regards

nonononononono, I only want to use RX line, I don't need to transmit any any data to the device during receving. But it is the hal lib always TIMEOUT when I am using HAL_SPI_Receive, which forces me to use HAL_SPI_TransmitReceive instead.

huoxingdawang avatar Mar 01 '24 02:03 huoxingdawang

Hello @huoxingdawang,

Can you share more details (for ex: Screenshots) about your Config & project, so we can reproduce it. I tested the HAL_SPI_Receive() in a project (different from yours) and I did not get any kind of error (especially a timeout).

With regards.

KRASTM avatar Mar 06 '24 09:03 KRASTM

image image image No user constants, NVIC and DMA for spi1 is configured. Both ICache and DCache are enabled, compile using AC6 in MDK, -ofast and link time optimization was enabled. The core of the code is something like the following:

#define SPICS(x)                do{SPI1_CS1_GPIO_Port->BSRR=(x)?(SPI1_CS1_Pin):(SPI1_CS1_Pin<<16);}while(0)
void spiRead(uint32 addr,uint8*data,uint32 size)
{
    const uint8 cmd[]={
        0x03,
        (uint8)(addr>>16),
        (uint8)(addr>>8),
        (uint8)(addr),
    };
    SPICS(0);
    HAL_SPI_Transmit(&hspi1,cmd,sizeof(cmd),2000);
    HAL_SPI_Receive(&hspi1,data,size,2000);
    SPICS(1);
}

We also noticed that sometimes a timeout is returned, and sometimes no timeout is returned but the data is wrong. By using HAL_SPI_TransmitReceive instead of HAL_SPI_Receive there will be no problems, nor will rolling back to 1.11.0.

huoxingdawang avatar Mar 06 '24 10:03 huoxingdawang

Hello @huoxingdawang,

Can you try enabling the parameter "Master Keep Io State" in the SPI Configuration?

With regards.

KRASTM avatar Mar 11 '24 10:03 KRASTM

Hello @KRASTM, Thank you for your reply. Unfortunately, this doesn't seem to work. image Besides, we found out earlier today that it seemed to have something to do with the interrupt. If we turn off interrupts in certain steps, the bug seems to change. We are conducting further testing. With regards.

huoxingdawang avatar Mar 11 '24 14:03 huoxingdawang

We guess it is because an interrupt is entered during the receive process, but SPI does not pause in receive-only mode, which results in data loss, making while (hspi->RxXferCount > 0UL) never exit, resulting in a timeout.

We have also tested Master Receive auto susp before. According to our understanding, this will block SPI rx when the rxFIFO is full, but for some reason, turning this option on or off does not seem to have any impact on us.

This seems to explain why it is normal to use HAL_SPI_TransmitReceive, because after entering the interrupt, the CPU will not send data to the SPI. If the SPI does not have new data, it will naturally stop TX and RX, without causing data loss.

huoxingdawang avatar Mar 11 '24 15:03 huoxingdawang

Fixed in db7e4a987cc58023a53a255f8617f6e3fb76731b

TOUNSTM avatar Apr 26 '24 14:04 TOUNSTM