atsamd
atsamd copied to clipboard
USB methods disable interrupts for too long
I'm working on a SAMD21 based USB device, which does the USB work in the USB ISR, along the lines of the usb_echo example. The USB interrupt is set to a lower priority (higher number) than another interrupt which needs to be serviced fairly quickly, but sometimes that doesn't happen quickly enough. The issue seems to be that UsbBus's methods do all their work in interrupt free contexts. For example, these.
Might it make sense for UsbBus to have an "in use" flag in the mutex rather than the RefCell, so that the interrupts only need to be disabled for setting/checking that flag?
Can you elaborate on 'too long'? We tried to do as little as possible in the UsbBus inner implementation.
The USB interrupt is set to a lower priority (higher number) than another interrupt which needs to be serviced fairly quickly
Depending on how many priority bits your chip has, could you boop the USB to have less priority than your other CSR?
Might it make sense for UsbBus to have an "in use" flag in the mutex rather than the RefCell, so that the interrupts only need to be disabled for setting/checking that flag?
I'm not sure i follow, can you give me an example?
Can you elaborate on 'too long'?
Maybe more importantly than the specific timing constraint, I was surprised to find that the entirety of read() and write() is in an interrupt free block since they call copy_to_nonoverlapping(). Isochronous endpoints can have up to 1023B per read/write, so there would be a pretty wide range of time that interrupts will be disabled.
Specifically, I've got about a 16us (ed: at 48MHz) window to respond to a DMA interrupt, and I believe that response usually takes about 3us so I should be able to tolerate about 12us of interrupts being disabled. Admittedly I haven't put much effort in to profiling - an early test showed a case where the DMA interrupt wasn't handled when the processor was in the USB ISR, so I went digging to see how that could happen...
could you boop the USB to have less priority than your other CSR
Yep, I'm pretty sure that is the current arrangement, but perhaps got something wrong. Will investigate further.
I'm not sure i follow, can you give me an example?
Sorry, what I wrote was a bit unclear, and thinking about it more I may be misunderstanding the point of the mutex. Will try to catch you on matrix to discuss, but for completeness I was imagining adding an AtomicBool field to UsbBus to indicate that work is being done on the RefCell contents. It might look something like below, where swap is along these lines:
fn write(&self, ep: EndpointAddress, buf: &[u8]) -> UsbResult<usize> {
if swap(self.busy_flag, true) {
// self.inner was already in use
Err(ComputerSaysNo)
} else {
// We have atomically set busy_flag, so are the only user of self.inner
// write() proceeds with interrupts enabled
let result = self.inner.borrow().write(ep, buf);
// release the flag
swap(self.busy_flag, false);
result
}
}
Ooooh I have an idea. Instead of turning off interrupts globally, what if we use mask and unmask to just turn off the IRQ lines for USB (USB_OTHER, USB_TRCPT0, and USB_TRCPT1on samd5x, just one on d21 iirc).
That way we can be happily preempted by more urgent interrupts while servicing, as long as they dont take too long that we miss a frame.
Something like this maybe:
struct UsbCS {
other_active: bool, // USB_OTHER irq
trcpt0_active: bool, // USB_TRCPT0 irq
trcpt1_active: bool, // USB_TRCPT1 irq
}
impl core::ops::Drop for UsbCS {
fn drop(&mut self) {
use cortex_m::peripheral::NVIC;
use crate::target_device::interrupt;
unsafe {
if self.other_active {
NVIC::unmask(interrupt::USB_OTHER);
}
if self.trcpt0_active {
NVIC::unmask(interrupt::USB_TRCPT0);
}
if self.trcpt1_active {
NVIC::unmask(interrupt::USB_TRCPT1);
}
}
}
}
impl UsbCS {
fn mutex_cs(&self) -> cortex_m::interrupt::CriticalSection {
unsafe { cortex_m::interrupt::CriticalSection::new() }
}
}
/// Disables IRQ lines related to USB, returning a token that resets the
/// masking state when dropped.
fn usb_critical_section() -> UsbCS {
use cortex_m::peripheral::NVIC;
use crate::target_device::interrupt;
let cs = UsbCS {
other_active: NVIC::is_enabled(interrupt::USB_OTHER),
trcpt0_active: NVIC::is_enabled(interrupt::USB_TRCPT0),
trcpt1_active: NVIC::is_enabled(interrupt::USB_TRCPT1),
};
NVIC::mask(interrupt::USB_OTHER);
NVIC::mask(interrupt::USB_TRCPT0);
NVIC::mask(interrupt::USB_TRCPT1);
cs
}
That way we can be happily preempted by more urgent interrupts while servicing, as long as they dont take too long that we miss a frame.
Is this the reason we're currently disabling interrupts? If so, it probably hasn't been working well in uses like this
#[interrupt]
fn USB() {
usb_bus.poll(&mut [a_usb_device]);
// this is a window for other interrupts
a_usb_device.do_some_reads_and_writes(); // likely contains windows for other interrupts
}
I guess we'll want a method along the lines of interrupt_free() that uses a usb_critical_section, maybe that should be in UsbBus?
#[interrupt]
fn USB() {
usb_bus.usb_interrupt_free(|cs| {
// Would poll(), read(), write(), etc need a reference to cs, the UsbCS?
usb_bus.poll(&mut [a_usb_device]);
a_usb_device.do_some_reads_and_writes();
});
}
Is this the reason we're currently disabling interrupts?
I think it was just to prevent us being interrupted by a different USB interrupt while servicing a USB interrupt. I don't think there is anything wrong with being preempted by a higher priority interrupt as long as its not USB-related, but theres only one way to find out xD
That way we can be happily preempted by more urgent interrupts while servicing, as long as they dont take too long that we miss a frame.
Thinking about this more - I don't know if it's really what is needed. Imagine there's a GPIO interrupt that triggers a USB write - we wouldn't want it to stomp on an in-progress USB write...
I've just yanked out the interrupt::free(), Mutex, and RefCell - it seems to work this state on SAMD21 (which only has one USB interrupt) without doing any masking, so gets around my immediate problem. This is obviously sketchy though, there still needs to be a mechanism to only allow one "thread" to access the UsbBus at a time.
Since the methods where we had interrupt::free() are implementing usb-device, I guess we can't make them require a CriticalSection. In case of contention, should the second call to a UsbBus method block, or return an error - maybe InvalidState?
I've just yanked out the interrupt::free(), Mutex, and RefCell - it seems to work this state on SAMD21 (which only has one USB interrupt) without doing any masking, so gets around my immediate problem
Can you share the diff somewhere? Im curious how you did this
Sure, https://github.com/ianrrees/atsamd/commit/b1298760e4d2d4b17fe44ba9ea7372c941ff440f and its parent. It's not pretty :).
(edit: updated link)
seems to work ... without doing any masking
This was premature, it definitely needs a mechanism to prevent USB interrupts while already processing one.
I think there's a fundamental issue here, but feel a bit out of my depth.
Imagine that we have an implementation of mutex that allows work on the inner struct without disabling interrupts for the duration of that work. When a method tries to access the inner struct through that mutex, the mutex will either have to block or return an error. Since we don't have an OS, the blocking option doesn't make sense (the interrupted code that holds the lock won't make progress until the interrupting code is done), so the locking attempt must return an error. But, we don't have a way to get that error out of several methods required by usb_device that need exclusive access to the inner struct, for instance reset().
A secondary question is how to handle an interrupt that fails to get the lock.
Sorry for my lack of follow-through on this, the above hack is working well enough in the meantime ;).
This was premature, it definitely needs a mechanism to prevent USB interrupts while already processing one.
Can't remember what prompted me to write this, but think it might be incorrect; AFAICT any particular ISR on SAMD21 won't ever interrupt itself.
I'm becoming convinced that although the mutex in the UsbBus implementation may be necessary in the current implementation of usb-device, with some hopefully-straightforward changes to usb-device (see https://github.com/mvirkkunen/usb-device/issues/9) it won't be. Currently, a typical use-case involves multiple structs that refer to the same UsbBus, but if we instead had one struct that owned all the USB stuff, then Rust's ownership guarantees should be sufficient to ensure that only one execution context can access it at a time.
There might still be a mutex required to mediate access to the USB stuff, but it would be easier to implement one that only blocks interrupts for a few instructions, for example.
dirbaio on Matrix suggested that the stm32-usbd doesn't have this particular issue, perhaps it's possible to make UsbBus and/or Inner not by Sync, so that the user either picks their own mutex or only accesses the USB from one particular context.
@ianrrees, did you settle on a solution here? Does this need a change to the HAL or a change upstream?
Currently I'm using a branch that removes the mutex/critical section (CS), it works but isn't ideal. Some possible ways we could resolve this, all involve changes to the HAL:
- usb-device changes along the lines of point 2 at https://github.com/mvirkkunen/usb-device/issues/9 , which I think would then mean the HAL implementation wouldn't need the CS to begin with.
- It might be possible to push the CS down closer to the hardware, so that the external API stays the same but there are only a couple instructions at a time in a CS. From memory, this would be a fairly involved refactoring.
- Make the mutex implementation configurable via feature flag, and provide one that either takes the mutex or panics. I've found it's really not difficult to only access the USB methods from a USB interrupt context, and would expect that a bad access pattern with a lock/panic mutex would result in a panic pretty quickly. So, this isn't really the cleanest solution, but it might be a good enough "pressure relief valve" for situations where people need to use USB along with lower-latency interrupts.
for something along the lines of 3, you can check if you're in interrupt context with this cortex-m function and panic if you're in thread context
I think option 3 is the best at the moment. Should be fairly simple and that way you either access USB methods from different context safely or get maximum performance at the cost of panics if it's used from multiple contexts (which I think wouldn't be too difficult to avoid). Maybe mutexes should be opt-out to make the safer option the default.
Another option would be to let the user provide a mutex with the current one as the default.
@ianrrees is there any update needed to this issue as a result of the recent USB PRs? Or is it the same story
Recent changes are unrelated, I was thinking about picking this back up though...
Any opposition to approach 3 from a few comments back?
Nope, I think it makes good sense