tinyusb icon indicating copy to clipboard operation
tinyusb copied to clipboard

Only first bulk OUT EP write works

Open diggit opened this issue 4 years ago • 9 comments

Set Up

  • Archlinux, distro kernel 5.12.8,
  • custom board with STM32H730 and USB PHY USB3343 (may provide schematics on request)
  • TinyUSB version b34724215bea28ef93fd80ed9c2e4ddc87874093 with synopsys hotfix #496

Firmware

  • based on RTOS (modified Argon)
  • osal implemented and queue passes events to tud_task
EPs
uint8_t const desc_hs_configuration[] = {
	// Config number, interface count, string index, total length, attribute, power in mA
	TUD_CONFIG_DESCRIPTOR(1, ITF_NUM_TOTAL, 0, CONFIG_TOTAL_LEN, TUSB_DESC_CONFIG_ATT_REMOTE_WAKEUP, 500),

	TUD_VENDOR_DESCRIPTOR(ITF_NUM_VENDOR_0, 4, EPNUM_VENDOR_0, 0x80 | EPNUM_VENDOR_0, 512)
};

Describe The Bug

Driver callback is called only after first bulk OUT xfer.

To Reproduce

Steps to reproduce the behavior:

  1. Write more than once from master to bulk OUT EP.
  2. Observe, that vendor callback is called only in response to first xfer.

Logs

I've added logging of int_status in the beginning of dcd_int_handler in dcd_synopsys.c (TU_LOG2("(IS %08X)", int_status);) This log is enclosed in parentheses for easier reading due to its async nature. There is less interrupts during second xfer.

First try

(IS C4808428)USBD Resume 
(IS 44808438)(IS 44808428)
USBD Setup Received 80 00 00 00 00 00 02 00 
  Get Status
  Queue EP 80 with 2 bytes ... (IS 44848428)(IS 44848428)OK
USBD Xfer Complete on EP 80 with 2 bytes
  Queue EP 00 with 0 bytes ... (IS 44808438)(IS 44808428)OK

USBD Setup Received 00 09 01 00 00 00 00 00 
  Set Configuration
  Queue EP 80 with 0 bytes ... O(IS 44848428)K
USBD Xfer Co(IS 44808438)(IS 44808428)mplete on EP 00 with 0 bytes
USBD Xfer Complete on EP 80 with 0 bytes
USBD Xfer Complete on EP 01 with 4 bytes
  VENDOR xfer callback
  Queue EP 01 with 512 bytes ... OK
(IS 44808C28)USBD Suspend 

Second try

(IS C4808428)USBD Resume 
(IS 44808438)(IS 44808428)
USBD Setup Received 80 00 00 00 00 00 02 00 
  Get Status
  Queue EP 80 with 2 bytes ... (IS 44848428)(IS 44848428)OK
USBD Xfer Complete on EP 80 with 2 bytes
  Queue EP 00 with 0 bytes ... (IS 44808438)(IS 44808428)OK

USBD Setup Received 00 09 01 00 00 00 00 00 
  Set Configuration
  Queue EP 80 with 0 bytes ... (IS 44848428)OK
USBD Xfer Complete on EP 00 with 0 bytes
USBD Xfer Complete on EP 80 with 0 bytes
(IS 44808C28)USBD Suspend 

Full log without int flags logging

log_sniff_fail_on_second_xfer_wr.txt

pcap capture from host

usb_sniff_fail_on_second_xfer_wr.zip Everything seems nominal or at least I can't spot difference between two writes.

diggit avatar Jun 18 '21 22:06 diggit

There is no clues if this is an tinyusb issue. There is too many custom changes from your setup. And there is no way for me to understand and track down the issue. If you could reproduce the issue with

  • one of supported board
  • one of example with slight modification

Then I could possible help you with narrow it down.

hathach avatar Jun 20 '21 18:06 hathach

I got rid of RTOS. CDC works normally. (ar at least some data are comming into device, LEDs are counting in binary). See cdc_task below So i'd consider HW being OK. I do not have any of supported board at hand.

static void cdc_task(void) {
	uint8_t itf;

	static unsigned int cnt = 0;

	for (itf = 0; itf < CFG_TUD_CDC itf++) {
		// connected() check for DTR bit
		// Most but not all terminal client set this when making connection
		// if ( tud_cdc_n_connected(itf) )
		{
			if (tud_cdc_n_available(itf)) {
				uint8_t buf[64];

				uint32_t count = tud_cdc_n_read(itf, buf, sizeof(buf));
				cnt++;
				pins::LedCan1Tx.setLevel(cnt & 0x1);
				pins::LedCan1Rx.setLevel(cnt & 0x2);
				pins::LedCan1Err.setLevel(cnt & 0x4);
			}
		}
	}
}

switching into vendor class by modifying sources from working CDC

// config relevant lines
#define CFG_TUD_CDC               0
#define CFG_TUD_VENDOR            1
#define CFG_TUD_VENDOR_RX_BUFSIZE (TUD_OPT_HIGH_SPEED ? 512 : 64)
#define CFG_TUD_VENDOR_TX_BUFSIZE (TUD_OPT_HIGH_SPEED ? 512 : 64)
// descriptors relevant lines

enum {
  ITF_NUM_VENDOR,
  ITF_NUM_TOTAL
};

#define CONFIG_TOTAL_LEN    (TUD_CONFIG_DESC_LEN + CFG_TUD_VENDOR * TUD_VENDOR_DESC_LEN)

#define EPNUM_VENDOR      0x02

uint8_t const desc_fs_configuration[] =
{
  // Config number, interface count, string index, total length, attribute, power in mA
  TUD_CONFIG_DESCRIPTOR(1, ITF_NUM_TOTAL, 0, CONFIG_TOTAL_LEN, TUSB_DESC_CONFIG_ATT_REMOTE_WAKEUP, 100),

  TUD_VENDOR_DESCRIPTOR(ITF_NUM_VENDOR, 4, EPNUM_VENDOR, 0x80 | EPNUM_VENDOR, 64)

};

#if TUD_OPT_HIGH_SPEED
uint8_t const desc_hs_configuration[] =
{
  // Config number, interface count, string index, total length, attribute, power in mA
  TUD_CONFIG_DESCRIPTOR(1, ITF_NUM_TOTAL, 0, CONFIG_TOTAL_LEN, TUSB_DESC_CONFIG_ATT_REMOTE_WAKEUP, 100),

  TUD_VENDOR_DESCRIPTOR(ITF_NUM_VENDOR, 4, EPNUM_VENDOR, 0x80 | EPNUM_VENDOR, 512 )
};
#endif

```c
static void vendor_task(void) {
	uint8_t itf;

	static unsigned int cnt = 0;

	for (itf = 0; itf < CFG_TUD_VENDOR; itf++) {
		{
			if (tud_vendor_n_available(itf)) {
				uint8_t buf[64];

				uint32_t count = tud_vendor_n_read(itf, buf, sizeof(buf));
				cnt++;
				pins::LedCan1Tx.setLevel(cnt & 0x1);
				pins::LedCan1Rx.setLevel(cnt & 0x2);
				pins::LedCan1Err.setLevel(cnt & 0x4);
			}
		}
	}
}

and my test script

#!/usr/bin/env python
import usb.core
import usb.util

from random import randbytes
from time import sleep

# find our device
dev = usb.core.find(idVendor=0xcafe, idProduct=0x4010)

# was it found?
if dev is None:
	raise ValueError('Device not found')

dev.set_configuration()

# get an endpoint instance
cfg = dev.get_active_configuration()
intf = cfg[(0,0)]

epw = usb.util.find_descriptor(
	intf,
	# match the first OUT endpoint
	custom_match = \
	lambda e: \
		usb.util.endpoint_direction(e.bEndpointAddress) == \
		usb.util.ENDPOINT_OUT)

assert epw is not None

# write the data
for _ in range(2):
	tx = randbytes(4)
	print(list(tx))
	epw.write(tx)
	sleep(0.2)

Now what's weird

If I try to write only once per execution of script (for _ in range(1)), only first try works and write in next script execution does not get to buffer. However when I write two or more timer (for _ in range(2)) per script execution, everything seems to work. Interleaving writes and reads (eg just loopback) also works as long as there are at least 2 writes per script execution.

diggit avatar Jun 28 '21 18:06 diggit

With python-libusb1 library, issue seems to be non-existent. I'll try to compare behavior of python-pyusb and python-libusb1.

diggit avatar Jun 28 '21 20:06 diggit

The issue seems to be the stack's vendor driver, currently it doesn't guard against race condition as cdc. I haven't updated the vendor driver once introducing the usbd_edpt_claim() for race protection. I actually have an plan to abstract these to stream/buffered endpoints API for driver and haven't got time to do so.

https://github.com/hathach/tinyusb/blob/master/src/class/vendor/vendor_device.c#L109 https://github.com/hathach/tinyusb/blob/master/src/class/cdc/cdc_device.c#L83

hathach avatar Jun 29 '21 06:06 hathach

Would it be feasible to port cdc's guarding to vendor class driver? IMO It would be better to have vendor class driver working until there is some unified streamed EPs API.

diggit avatar Jun 30 '21 17:06 diggit

yeah, it is better to fix existing problem with vendor right now. If you could submit PR, I am happy to review. Otherwise, give me a bit of time, I will do it when I have time.

hathach avatar Jun 30 '21 19:06 hathach

OT: looking onto claim/release functions, they are using mutex with RTOS... Did you consider using atomic_compare_exchange? It may require chaging that struct to bitfield (I am not very familiar with atomics in C, I work in C++), but locking mutex seems like an overkill for this.

diggit avatar Jul 04 '21 08:07 diggit

I guess the method is still under consideration. #662 mentions atomic operations for ARM architecture. FYI, RTX implements atmic_wr8 by using these instructions.

kkitayam avatar Jul 04 '21 09:07 kkitayam

OT: looking onto claim/release functions, they are using mutex with RTOS... Did you consider using atomic_compare_exchange? It may require chaging that struct to bitfield (I am not very familiar with atomics in C, I work in C++), but locking mutex seems like an overkill for this.

There are 2 issue

  • atomic is gcc extension, also it kind of requires C11 or higher, not sure but it is technically resolvable (just take time to do it more portable)
  • only M3/M4/M7 have mutex instructions that does not requires disable global interrupt. Global interrupts is even worse than using rtos mutex and/or disable USB IRQ.

Anyway, yeah, I am aware of those atomic function, just need time to execute it. For now, we will reply on rtos/ usb irq to provide mutex.

hathach avatar Jul 04 '21 16:07 hathach

Possibly related? https://github.com/espressif/esp-idf/issues/10118

mitchellcairns avatar Nov 07 '22 00:11 mitchellcairns

Closed by https://github.com/hathach/tinyusb/commit/876f49f6ad10ef2bca000602fe862889c3e0268f

HiFiPhile avatar Feb 01 '24 09:02 HiFiPhile