artiq icon indicating copy to clipboard operation
artiq copied to clipboard

Panic at runtime/rtio_mgt.rs after repeated DMA usage

Open lriesebos opened this issue 1 year ago • 18 comments

Bug Report

One-Line Summary

Kasli panics sometimes when doing repeated DMA calls to a channel on a satellite.

Issue Details

Steps to Reproduce

My system has a single satellite. This satellite has a TTLOut that is used to run this function: https://gitlab.com/duke-artiq/dax/-/blob/master/dax/modules/rtio_benchmark.py#L479. This is a benchmark we use to measure performance of the Kasli. Sometimes the Kasli panics when we run that code.

The panic was not observed when running the same test on a TTLOut on the master, though I cannot test that exhaustively.

The panic is visible in the UART logs of the master. The satellite does not report any UART messages when the master panics, except that the link is lost.

Expected Behavior

No panic.

Actual (undesired) Behavior

Panic.

[2023-06-29 16:40:31] [   771.652731s]  INFO(runtime::session): no connection, starting idle kernel
[2023-06-29 16:40:31] [   771.736134s]  INFO(runtime::kern_hwreq): resetting RTIO
[2023-06-29 16:40:31] [   771.790687s]  INFO(runtime::session): new connection from 192.168.1.100:33904
[2023-06-29 16:40:31] panic at runtime/rtio_mgt.rs:51:29: called `Result::unwrap()` on an `Err` value: Interrupted
[2023-06-29 16:40:31] backtrace for software version 7.8173.ff97675;[removed gateware id]:
[2023-06-29 16:40:31] 0x4003d29c
[2023-06-29 16:40:31] 0x4000af10
[2023-06-29 16:40:31] 0x4000a510
[2023-06-29 16:40:31] 0x40028088
[2023-06-29 16:40:31] 0x40027eb0
[2023-06-29 16:40:31] 0x400211d8
[2023-06-29 16:40:31] 0x40024e80
[2023-06-29 16:40:31] 0x4002cc4c
[2023-06-29 16:40:31] 0x4000e7c4
[2023-06-29 16:40:31] 0x4000e7b4
[2023-06-29 16:40:31] 0x40038c94
[2023-06-29 16:40:31] 0x4000770c
[2023-06-29 16:40:31] 0x40028f40
[2023-06-29 16:40:31] 0x40028ee8
[2023-06-29 16:40:31] 0x4003c414
[2023-06-29 16:40:31] halting.
[2023-06-29 16:40:31] use `artiq_coremgmt config write -s panic_reset 1` to restart instead

Your System (omit irrelevant parts)

  • Operating System: Linux (Ubuntu 22.04)
  • ARTIQ version: 7.8173.ff97675
  • Version of the gateware and runtime loaded in the core device: same as ARTIQ version
  • Hardware involved: 2x Kasli 2.0 with master-satellite configuration, a DIO card on the satellite

lriesebos avatar Jun 29 '23 21:06 lriesebos