artiq icon indicating copy to clipboard operation
artiq copied to clipboard

Phaser can enter "lock up" state that requires restart from interrupted experiments

Open cjbe opened this issue 3 years ago • 5 comments

Bug Report

One-Line Summary

Phaser can enter "lock up" state that requires a power-cycle due to an interrupted experiment

Steps to Reproduce

  1. Run an experiment that uses Phaser
  2. Force kill / RTIO underflow the experiment

Sometimes (every few minutes) the subsequent experiment using Phaser fails with the error "cannot read board ID" in the init.

After reloading the FPGA (artiq_flash start) the error changes to "DUC+Oscillator phase/amplitude test failed" This continues until the system is power cycled (at the moment I am doing this by pulling the power for the whole rack).

Expected Behavior

The phaser init() always completes successfully.

Your System (omit irrelevant parts)

Using Artiq master & Phaser master gateware / firmware.

cjbe avatar Jan 28 '21 23:01 cjbe

Bump. RTIO underflows happen regularly in our workflow and we can't afford to power cycle kasli frequently. We often need to operate remotely. Further, a power cycle results in bad thermal transients in our RF chain.

pathfinder49 avatar Mar 04 '21 20:03 pathfinder49

ping @jordens what's the plan for resolving this issue?

hartytp avatar Mar 29 '21 18:03 hartytp

No specific plan (as for many issues that people have posted but where long term funding and momentum for continued debugging, development, and maintenance does not exist). We're happy to offer paid support. In any case, more context and an an effort to provide an MWE would be good.

jordens avatar Mar 29 '21 20:03 jordens

Just to confirm, this is definitely phaser master from January?

jordens avatar Mar 31 '21 16:03 jordens

Just to confirm, this is definitely phaser master from January?

Yes, commit https://github.com/quartiq/phaser/commit/b36e506b08382969e785597de0cc0e6c222b0445

cjbe avatar Mar 31 '21 16:03 cjbe