artiq icon indicating copy to clipboard operation
artiq copied to clipboard

Startup kernels should not be interruptable

Open dnadlinger opened this issue 5 years ago • 5 comments

One-Line Summary

The startup kernel should not be interruptable by other kernels.

Issue Details

Steps to Reproduce

  1. Flash core device with a startup kernel that takes an appreciable amount of time (e.g. waiting for all RTIO destinations to be up and initialising DDSes, …).
  2. Restart the core device.
  3. While the core device is booting (e.g. waiting for recovered clock), submit an experiment that launches a kernel.

Expected behavior

The experiment submitted on the master runs after the startup kernel.

Actual (undesired) behavior

The startup kernel gets interrupted by the experiment, leaving the hardware in a partially-initialized state.

dnadlinger avatar Mar 31 '19 18:03 dnadlinger

@cjbe: I think most of the weird startup behaviour we have been seeing should be explained between this and the Urukul sync phase tuning issue on Alice.

dnadlinger avatar Mar 31 '19 18:03 dnadlinger

I fundamentally agree that there is a need for something like "critical sections". Two things:

  • Hung startup kernels: Is there a use case that needs to evict startup kernels that are hung for some reason (waiting for input, drtio link, programming error) by other means than restarting the core device?
  • Critical sections in other kernels, e.g. idle kernel. Composite SPI transfers that are aborted by eviction would be a use case.

jordens avatar Mar 31 '19 19:03 jordens

That was implemented in earlier ARTIQ versions by not opening the server socket in the runtime until the startup kernel has run, so the computer would get a "connection refused" error instead of interrupting the startup kernel. I guess that behavior got inadvertently changed later on.

sbourdeauducq avatar Apr 01 '19 01:04 sbourdeauducq

by other means than restarting the core device?

In my experience, having a USB (JTAG/serial) connection to each core device is indispensable anyway, so I'm not too worried about this.

By definition, hung startup kernels only occur when (re)starting the device, so just power-cycling the device again should never be a problem for users.

dnadlinger avatar May 14 '19 22:05 dnadlinger

This is fixed in the new firmware on Zynq.

sbourdeauducq avatar Dec 14 '20 05:12 sbourdeauducq