`cdcd_xfer_cb` callback uses `p_cdc->epout_buf` after endpoint was released
This is a concurrency issue when running in a RTOS environment. When receiving data on a device CDC, in short the issue is that both cdcd_xfer_cb() and tud_cdc_n_read() can call _prep_out_transaction(), but the cdcd_xfer_cb() uses the buffer that _prep_out_transaction() repurposed for the next transfer after the endpoint was released, potentially overriding old data in that buffer with new one before it was consumed and added to the fifo.
The _prep_out_transaction() function starts a new OUT transfer if both there's enough space in the rx_ff fifo to hold a whole OUT xfer and it can successfully claim the ep_out endpoint. Claiming an endpoint is done by checking the value of claimed and busy in usbd_edpt_claim(), however the endpoint is released automatically by the DCD_EVENT_XFER_COMPLETE event in tud_task before calling the driver xfer_cb callback, so at any point while cdcd_xfer_cb is handling the transfer that just finished the endpoint is free (not busy nor claimed) but the callback still needs to copy the data out of the transfer buffer (epout_buf).
Consider the following scenario:
- Two FreeRTOS tasks: one just running
tud_task()in a loop, one reading from the only CDC withtud_cdc_readand a large buffer (512 bytes). This is like the cdc_msc_freertos example but with both tasks at the same priority, this difference is important. - Host sends large amount of data all together (>2KB, assuming high-speed, 512 OUT transfers at a time).
- The first OUT transfer in the cdc endpoint is completed,
cdcd_xfer_cbis called from the tud_task task, but before it callstu_fifo_write_n()to add the bytes to the fifo the task is interrupted and the other task runs. - The "cdc" reader task calls
tud_cdc_n_read()with a large enough buffer which will empty the fifo and start a new transfer in_prep_out_transaction()because both conditions are met (fifo is empty and endp is not claimed). - USB hardware ISR or DMA might at any point interrupt and write the new OUT transfer data to
epout_buf. - the usb task continues running, calls
tu_fifo_write_n()withepout_bufwhich now may contain data from a newer OUT transfer.
In the example mentioned this situation doesn't occur because the task running tud_task has the highest priority of all the tasks calling TU API, I don't know if this condition is required by tinyusb but I didn't see it documented or perhaps I missed that.
This bug was a lot easier to reproduce before commit 7fc99a9e1104703566945ace1b2e1ed59819ba48 since we were calling tu_fifo_write() n times, instead of tu_fifo_write_n one time, so there was a lot more time for things to go wrong, but the race condition still exists. Perhaps relevant for context is to read #507 and #508, however this bug is about the OUT / RX side.
My proposed fix would be to change the xfer_cb() driver api so that the endpoint is not released before calling xfer_cb() and it is responsibility of this function to release the endpoint after it is done using it. This might require changes in all drivers. Another option is to just fix it for cdc (and others that have a similar arrangement like midi) by adding another boolean to the cdcd_interface_t to prevent _prep_out_transaction from starting a new transfer until cdcd_xfer_cb() is done with the buffer. Other ideas?
Thank you for your issue, would you mind making an pr to demonstrate the changes and since you are already at it. I could revise this, but I currently have a huge issue with (lack of) time.