atsamd USB methods disable interrupts for too long

I'm working on a SAMD21 based USB device, which does the USB work in the USB ISR, along the lines of the usb_echo example. The USB interrupt is set to a lower priority (higher number) than another interrupt which needs to be serviced fairly quickly, but sometimes that doesn't happen quickly enough. The issue seems to be that UsbBus's methods do all their work in interrupt free contexts. For example, these.

Might it make sense for UsbBus to have an "in use" flag in the mutex rather than the RefCell, so that the interrupts only need to be disabled for setting/checking that flag?

Feb 01 '21 23:02 ianrrees

Can you elaborate on 'too long'? We tried to do as little as possible in the UsbBus inner implementation.

The USB interrupt is set to a lower priority (higher number) than another interrupt which needs to be serviced fairly quickly

Depending on how many priority bits your chip has, could you boop the USB to have less priority than your other CSR?

Might it make sense for UsbBus to have an "in use" flag in the mutex rather than the RefCell, so that the interrupts only need to be disabled for setting/checking that flag?

I'm not sure i follow, can you give me an example?

Feb 01 '21 23:02 twitchyliquid64

Can you elaborate on 'too long'?

Maybe more importantly than the specific timing constraint, I was surprised to find that the entirety of read() and write() is in an interrupt free block since they call copy_to_nonoverlapping(). Isochronous endpoints can have up to 1023B per read/write, so there would be a pretty wide range of time that interrupts will be disabled.

Specifically, I've got about a 16us (ed: at 48MHz) window to respond to a DMA interrupt, and I believe that response usually takes about 3us so I should be able to tolerate about 12us of interrupts being disabled. Admittedly I haven't put much effort in to profiling - an early test showed a case where the DMA interrupt wasn't handled when the processor was in the USB ISR, so I went digging to see how that could happen...

could you boop the USB to have less priority than your other CSR

Yep, I'm pretty sure that is the current arrangement, but perhaps got something wrong. Will investigate further.

I'm not sure i follow, can you give me an example?

Sorry, what I wrote was a bit unclear, and thinking about it more I may be misunderstanding the point of the mutex. Will try to catch you on matrix to discuss, but for completeness I was imagining adding an AtomicBool field to UsbBus to indicate that work is being done on the RefCell contents. It might look something like below, where swap is along these lines:

fn write(&self, ep: EndpointAddress, buf: &[u8]) -> UsbResult<usize> {
    if swap(self.busy_flag, true) {
        // self.inner was already in use
        Err(ComputerSaysNo)

    } else {
        // We have atomically set busy_flag, so are the only user of self.inner

        // write() proceeds with interrupts enabled
        let result = self.inner.borrow().write(ep, buf);

        // release the flag
        swap(self.busy_flag, false);

        result
    }
}

Feb 02 '21 02:02 ianrrees

Ooooh I have an idea. Instead of turning off interrupts globally, what if we use mask and unmask to just turn off the IRQ lines for USB (USB_OTHER, USB_TRCPT0, and USB_TRCPT1on samd5x, just one on d21 iirc).

That way we can be happily preempted by more urgent interrupts while servicing, as long as they dont take too long that we miss a frame.

Feb 02 '21 04:02 twitchyliquid64

Something like this maybe:

struct UsbCS {
    other_active: bool, // USB_OTHER irq
    trcpt0_active: bool, // USB_TRCPT0 irq
    trcpt1_active: bool, // USB_TRCPT1 irq
}

impl core::ops::Drop for UsbCS {
    fn drop(&mut self) {
        use cortex_m::peripheral::NVIC;
        use crate::target_device::interrupt;
        unsafe {
            if self.other_active {
                NVIC::unmask(interrupt::USB_OTHER);
            }
            if self.trcpt0_active {
                NVIC::unmask(interrupt::USB_TRCPT0);
            }
            if self.trcpt1_active {
                NVIC::unmask(interrupt::USB_TRCPT1);
            }
        }
    }
}

impl UsbCS {
    fn mutex_cs(&self) -> cortex_m::interrupt::CriticalSection {
        unsafe { cortex_m::interrupt::CriticalSection::new() }
    }
}

/// Disables IRQ lines related to USB, returning a token that resets the
/// masking state when dropped.
fn usb_critical_section() -> UsbCS {
    use cortex_m::peripheral::NVIC;
    use crate::target_device::interrupt;

    let cs = UsbCS {
        other_active: NVIC::is_enabled(interrupt::USB_OTHER),
        trcpt0_active: NVIC::is_enabled(interrupt::USB_TRCPT0),
        trcpt1_active: NVIC::is_enabled(interrupt::USB_TRCPT1),
    };

    NVIC::mask(interrupt::USB_OTHER);
    NVIC::mask(interrupt::USB_TRCPT0);
    NVIC::mask(interrupt::USB_TRCPT1);
    cs
}

Feb 02 '21 05:02 twitchyliquid64

That way we can be happily preempted by more urgent interrupts while servicing, as long as they dont take too long that we miss a frame.

Is this the reason we're currently disabling interrupts? If so, it probably hasn't been working well in uses like this

#[interrupt]
fn USB() {
    usb_bus.poll(&mut [a_usb_device]);

    // this is a window for other interrupts

    a_usb_device.do_some_reads_and_writes();  // likely contains windows for other interrupts
}

I guess we'll want a method along the lines of interrupt_free() that uses a usb_critical_section, maybe that should be in UsbBus?

#[interrupt]
fn USB() {
    usb_bus.usb_interrupt_free(|cs| {
        // Would poll(), read(), write(), etc need a reference to cs, the UsbCS?
        usb_bus.poll(&mut [a_usb_device]);
        a_usb_device.do_some_reads_and_writes();
    });
}

Feb 02 '21 22:02 ianrrees

Is this the reason we're currently disabling interrupts?

I think it was just to prevent us being interrupted by a different USB interrupt while servicing a USB interrupt. I don't think there is anything wrong with being preempted by a higher priority interrupt as long as its not USB-related, but theres only one way to find out xD

Feb 02 '21 22:02 twitchyliquid64

That way we can be happily preempted by more urgent interrupts while servicing, as long as they dont take too long that we miss a frame.

Thinking about this more - I don't know if it's really what is needed. Imagine there's a GPIO interrupt that triggers a USB write - we wouldn't want it to stomp on an in-progress USB write...

I've just yanked out the interrupt::free(), Mutex, and RefCell - it seems to work this state on SAMD21 (which only has one USB interrupt) without doing any masking, so gets around my immediate problem. This is obviously sketchy though, there still needs to be a mechanism to only allow one "thread" to access the UsbBus at a time.

Since the methods where we had interrupt::free() are implementing usb-device, I guess we can't make them require a CriticalSection. In case of contention, should the second call to a UsbBus method block, or return an error - maybe InvalidState?

Feb 04 '21 20:02 ianrrees

I've just yanked out the interrupt::free(), Mutex, and RefCell - it seems to work this state on SAMD21 (which only has one USB interrupt) without doing any masking, so gets around my immediate problem

Can you share the diff somewhere? Im curious how you did this

Feb 04 '21 22:02 twitchyliquid64

Sure, https://github.com/ianrrees/atsamd/commit/b1298760e4d2d4b17fe44ba9ea7372c941ff440f and its parent. It's not pretty :).

(edit: updated link)

Feb 04 '21 22:02 ianrrees

seems to work ... without doing any masking

This was premature, it definitely needs a mechanism to prevent USB interrupts while already processing one.

Feb 05 '21 02:02 ianrrees

I think there's a fundamental issue here, but feel a bit out of my depth.

Imagine that we have an implementation of mutex that allows work on the inner struct without disabling interrupts for the duration of that work. When a method tries to access the inner struct through that mutex, the mutex will either have to block or return an error. Since we don't have an OS, the blocking option doesn't make sense (the interrupted code that holds the lock won't make progress until the interrupting code is done), so the locking attempt must return an error. But, we don't have a way to get that error out of several methods required by usb_device that need exclusive access to the inner struct, for instance reset().

A secondary question is how to handle an interrupt that fails to get the lock.

Feb 08 '21 21:02 ianrrees

Sorry for my lack of follow-through on this, the above hack is working well enough in the meantime ;).

This was premature, it definitely needs a mechanism to prevent USB interrupts while already processing one.

Can't remember what prompted me to write this, but think it might be incorrect; AFAICT any particular ISR on SAMD21 won't ever interrupt itself.

I'm becoming convinced that although the mutex in the UsbBus implementation may be necessary in the current implementation of usb-device, with some hopefully-straightforward changes to usb-device (see https://github.com/mvirkkunen/usb-device/issues/9) it won't be. Currently, a typical use-case involves multiple structs that refer to the same UsbBus, but if we instead had one struct that owned all the USB stuff, then Rust's ownership guarantees should be sufficient to ensure that only one execution context can access it at a time.

There might still be a mutex required to mediate access to the USB stuff, but it would be easier to implement one that only blocks interrupts for a few instructions, for example.

Mar 11 '21 03:03 ianrrees

dirbaio on Matrix suggested that the stm32-usbd doesn't have this particular issue, perhaps it's possible to make UsbBus and/or Inner not by Sync, so that the user either picks their own mutex or only accesses the USB from one particular context.

Mar 23 '21 21:03 ianrrees

@ianrrees, did you settle on a solution here? Does this need a change to the HAL or a change upstream?

Jun 14 '21 23:06 bradleyharden

Currently I'm using a branch that removes the mutex/critical section (CS), it works but isn't ideal. Some possible ways we could resolve this, all involve changes to the HAL:

usb-device changes along the lines of point 2 at https://github.com/mvirkkunen/usb-device/issues/9 , which I think would then mean the HAL implementation wouldn't need the CS to begin with.
It might be possible to push the CS down closer to the hardware, so that the external API stays the same but there are only a couple instructions at a time in a CS. From memory, this would be a fairly involved refactoring.
Make the mutex implementation configurable via feature flag, and provide one that either takes the mutex or panics. I've found it's really not difficult to only access the USB methods from a USB interrupt context, and would expect that a bad access pattern with a lock/panic mutex would result in a panic pretty quickly. So, this isn't really the cleanest solution, but it might be a good enough "pressure relief valve" for situations where people need to use USB along with lower-latency interrupts.

Jun 15 '21 01:06 ianrrees

for something along the lines of 3, you can check if you're in interrupt context with this cortex-m function and panic if you're in thread context

Jun 15 '21 02:06 TDHolmes

I think option 3 is the best at the moment. Should be fairly simple and that way you either access USB methods from different context safely or get maximum performance at the cost of panics if it's used from multiple contexts (which I think wouldn't be too difficult to avoid). Maybe mutexes should be opt-out to make the safer option the default.

Another option would be to let the user provide a mutex with the current one as the default.

Jun 15 '21 14:06 Sympatron

@ianrrees is there any update needed to this issue as a result of the recent USB PRs? Or is it the same story

Apr 01 '22 03:04 TDHolmes

Recent changes are unrelated, I was thinking about picking this back up though...

Any opposition to approach 3 from a few comments back?

Apr 01 '22 03:04 ianrrees

Nope, I think it makes good sense

Apr 01 '22 03:04 TDHolmes

atsamd atsamd copied to clipboard

USB methods disable interrupts for too long

atsamd
atsamd copied to clipboard