pico-sdk icon indicating copy to clipboard operation
pico-sdk copied to clipboard

Feature request: Improve USB speed on OUT transfers (host -> RP2040 device)

Open czietz opened this issue 1 year ago • 5 comments

I'm developing an application that relies on achieving high transfer rates between a host and the RP2040 working as a USB device. I noticed that OUT transfers (host -> RP2040) are surprisingly slow, significantly slower than IN transfers (RP2040 -> host).

Decoding the traffic on the USB bus reveals the reason: grafik

The RP2040 initially NAKs all packets except the first in a frame, forcing the host to retransmit them. This can be seen in the screenshot: two DATA1 packets (the second one is the retransmission), two DATA0 packets (again the second one is the retransmission), etc. This effectively halves the achievable bus bandwidth in this direction. I assume it happens because the RP2040 cannot empty the buffer sufficiently fast before the next packet.

I think - but don't know how to prove it - that this could be solved by using double buffering in the USB controller and by then servicing the buffers in a "ping pong" manner. I can see that double buffering, albeit supported by hardware, is specifically disabled for OUT transfers in the USB stack: https://github.com/hathach/tinyusb/blob/4bfab30c02279a0530e1a56f4a7c539f2d35a293/src/portable/raspberrypi/rp2040/rp2040_usb.c#L149-L152

The following comment seems to indicate that this was done because the TinyUSB maintainer encountered some issues: https://github.com/hathach/tinyusb/blob/4bfab30c02279a0530e1a56f4a7c539f2d35a293/src/portable/raspberrypi/rp2040/rp2040_usb.c#L260-L264

Perhaps with some help from the Raspberry Pi developers this could be solved, bringing a significant USB speed increase.

czietz avatar Jul 28 '22 17:07 czietz

fyi - Bulk IN packes are slow/hangs as well, with certain hosts and CDC. More info here: https://github.com/hathach/tinyusb/issues/1572 You might want to re-test with more hosts, if you have not already?

mjan-jansson avatar Jul 28 '22 19:07 mjan-jansson

ping @liamfraser

lurch avatar Jul 28 '22 20:07 lurch

I've created a minimal example to demonstrate this: https://github.com/czietz/pico-usb-speed-check It reads and discards all data it receives from the host and produces data for the host as fast as possible.

The included speedcheck.py script measures the transfer rate. The actual numbers depend on the host, but for example on an RPi 400 I get:

Host -> Device: 575.25 kBytes/s
Device -> Host: 883.36 kBytes/s

This is not a factor 2 due to overhead that applies to both transfer directions. But still it shows the issue very clearly.

czietz avatar Jul 29 '22 06:07 czietz

Thanks for this example. I agree the discrepancy is because double buffering is disabled for Host -> Device (out from host). I won't be able to do it immediately but can spend some time profiling tinyusb to work out where the time is lost. If you want to do this yourself, a method we find really helpful is to set a gpio high on each function you want to profile and then set it low when you exit the function. We then attach a Saleae logic analyser and can see where the time is spent. Setting and clearing a gpio is one write so doesn't add much overhead.

liamfraser avatar Jul 29 '22 08:07 liamfraser

as mentioned in the comment of the code, I encountered some issue with double buffered in out endpoints. It is only a technical issues and fixable, but I don't have enough time and need to wrap that up in order to move on with other works. I may revise this later on, when having time, but any PRs are welcome.

hathach avatar Jul 30 '22 07:07 hathach

closing as a TinyUSB issue

kilograham avatar May 26 '23 15:05 kilograham