PX4-Autopilot icon indicating copy to clipboard operation
PX4-Autopilot copied to clipboard

Fix race condition in px4io serial driver

Open ThomasDebrunner opened this issue 3 years ago • 1 comments

Describe problem solved by this pull request

There seems to be a potential race condition within the px4io serial driver, that leads to double posting of the _completion_semaphore. This semaphore is posted from the UART or DMA interrupts. If the sem_timedwait times out waiting for the semaphore, it aborts the DMA and UART transfer, which also disables those interrupts. However, there is a race. The interrupt could happen before the interrupts are disabled or the _rx_dma_status gets set to _dma_status_inactive. Then the sempahore gets posted, without anyone waiting for it. In the next transfer, the code can then immediately obtain the semaphore and thus no longer waits for the semaphore to complete. This then usually leads to CRC errors. Now:

  • In the worst case, the CRC happens to match by accident, the data is treated as correct. There was another vulnerability in the px4io driver that does not sanity check the data and it is being used in a array-filling for loop. This has been observerved to cause arbitrary memory corruption in the system.
  • In another case, it the interrupt would again happen before they get disabled, posting the semaphore again, leading to more crc errors.
  • Before the more aggressive DMA abort calls introduced in a recent pr, the chances of this race happening used to be much bigger, basically triggering after each transmission time out.

Describe your solution

This PR introduces

  • Critical section around "waiting for semaphore" -> "setting _rx_dma_status" -> "making sure further interrupts are disabled" . This eliminates the race, prevents interrupt double posting
  • Take CRC check out of the loop, as it is independent of the DMA / RX / TX handling, does not need to be in a loop and does not need to be ina critical section
  • Prevent buffer overrun in the px4io driver on bogus data received

Describe possible alternatives

Mid-term goal would be to get rid of the px4io_serial driver and implement a simpler protocol that can use the nuttx serial driver instead.

Test data / coverage

[x] Tested on F7 on the bench. Does not seem to hang anymore Todo: [ ] Flight test on an F7 [ ] Test on H7 [ ] Test on F4

ThomasDebrunner avatar Aug 04 '22 10:08 ThomasDebrunner

@ThomasDebrunner @dagar out of flash

davids5 avatar Aug 08 '22 17:08 davids5