artiq
artiq copied to clipboard
Urukul sync issues with DRTIO
Bug Report
One-Line Summary
When using DRTIO, inter-channel synchronization with Urukul doesn't work, although the same configuration produces the expected results on a standalone crate.
Issue Details
Steps to Reproduce
- A single crate with the following hardware configuration:
{
"_description": "The description",
"target": "kasli",
"variant": "my-variant",
"hw_rev": "v2.0",
"base": "master",
"peripherals": [
{
"type": "urukul",
"ports": [3, 4],
"hw_rev": "v1.4",
"dds": "ad9910",
"refclk": 125e6,
"clk_sel": 2,
"clk_div": 0,
"pll_n": 32,
"synchronization": true
}
]
}
(+ more, hopefully irrelevant peripherals)
The Urukul v1.4 card has IFC mode 1010
(en_ad9910
and en_eem1
activated).
-
device_db
generated byartiq_ddb_template
- Run
artiq_sinara_tester
to calibrate theio_update
andsync
delays and write the results to EEPROM - The following experiment:
from artiq.experiment import *
class DDSPhase(EnvExperiment):
def build(self):
self.setattr_device("core")
self.rf0 = self.get_device("urukul0_ch1")
self.rf1 = self.get_device("urukul0_ch2")
@kernel
def run(self):
self.core.reset()
self.rf0.cpld.init()
self.rf0.init()
self.rf0.set_att_mu(160)
self.rf0.set_phase_mode(2)
self.rf1.init()
self.rf1.set_att_mu(160)
self.rf1.set_phase_mode(2)
self.rf0.set(100*MHz)
self.rf1.set(100*MHz, phase=0.5)
self.rf0.sw.on()
self.rf1.sw.on()
Expected Behavior
100 MHz sines on CH1 and CH2 have 180° phase difference
Actual (undesired) Behavior
- The phase difference is not always 180°. Occurrence of the correct phase difference between re-executions of the experiment seems random. The behavior is the same when a satellite is present (and also on a Urukul on the satellite)
- When substituting
"base": "standalone"
, rebuilding and reflashing, the system behaves as expected all the time.
Questions / comments
- Is this a known, reproducible issue with DRTIO master/satellite configurations?
- We seem to have an issue with the SYNC tree: with Urukul v1.4 and v1.5, channel 0 is always faulty because the PLL doesn't lock and the SMP_ERR flag is up (according to the CPLD status register). The other channels are fine though. And in standalone mode, the phase relationships between channels 1,2,3 are correct. This seems unrelated to this particular issue though. Is there something I missed on referencing CH0 to the FPGA sync and not using it as master for the SYNC tree?
Your System (omit irrelevant parts)
- Operating System: Linux / nix
- ARTIQ version: v7.7636.ea1dd2da.beta
- Version of the gateware and runtime loaded in the core device: same as software, built using Vivado 2020.1
- Hardware involved: Kasli 2.0.1, Urukul 1.4
- Urukul CPLD image: v1.4.0 (build 169624)
@dnadlinger @jordens Is this a known issue? or a problem with our particular setup?
I think:
- The phase difference of 180 deg isn't worrying per se, as long as it's deterministic. The latter is the problem.
-
en_eem1
isn't useful here. The signals are not used on Kasli and might cause crosstalk. - There is no actual satellite here. But if there were, to use the eeprom seeds, the eeprom read would have to work over drtio aux. Maybe that's a bit unexplored. But shouldn't be relevant here.
- Could you try using the seed values the calibration routine spits out directly? I.e. pass them in device_db.
- I have seen that SMP_ERR failure on ch0 as well with factory-flashed devices. I have no idea what was flashed exactly and haven't looked how that's compiled. But the PLL always locks. But this may or may not be a different problem. Could you try the released binaries? https://github.com/quartiq/urukul/releases/tag/v1.4.0
- I don't see why ch0 would be different in the code. Maybe try a few ms or 100 ms delay between
cpld.init()
andrf0.init()
or swaprf0
andrf1
. - If this holds up against the tests above, we should look at whether the clocking of the sync pulse generator on kasli matches between standalone and master.
@airwoodix: We've definitely tested this on DRTIO master builds in the past, but it's been a few years since the initial bring-up.
Hi @airwoodix and all, have you been able to replicate this issue again with ARTIQ-7 (e.g. 43eab14f566d7205ea1261151c3078a17df0970a)?
With ARTIQ-7 and ARTIQ-6 I have tried your code on a DRTIO master setup with a single Urukul card, but I have never seen absurd variance in terms of phase difference across power cycles and reboots. The worst STD I can get on the DRTIO master is ~0.01 rad. I also compared the phase with a standalone setup on the same hardware, and I can't see significant difference in the results.
On the other hand, I noted one of @airwoodix's observation was that the red LED for CH0 turned on when sync'ed. I confirm that this can be reproduced, and I get the following observation (NB the criteria of the channel fault indicator is SYNC_SMP_ERR | ~PLL_LOCK):
- Given calibrated sync, as long as
set_att()
is not called on any channel, DDS0 SYNC_SMP_ERR stays 0, and DDS0 PLL_LOCK stays 1. - Given calibrated sync, if
set_att()
is called on any channel, the first read of the CPLD status register (usingurukul.CPLD.sta_read()
) right afterurukul.CPLD.init()
always returns a value where DDS0 PLL_LOCK is 0, but every subsequent read returns DDS0 PLL_LOCK = 1. PLL_LOCK = 1 for DDS1 to DDS3.
I suspect there might be some issue doing SPI transactions within a Urukul card. I could look into that, but that will not be related to this very issue about Urukul AD9910 sync phase errors.
Hello @HarryMakes, thanks for your feedback. This issue is still open in our group. Due to being a bit of a low-priority in the past months, I did not work on this topic that much but it sounds like a good Debugging-Friday job ;)
Thanks. I just found that in ARTIQ-7 the sync_delay_seed
and io_update_delay
values read from the EEPROM are incorrect. I did not pay attention to it since I switched to hardcoding the values in device_db.py
when testing.
I am going to investigate and propose a bug fix for this behaviour.
Correction: I have not found issues with writing and reading the sync_delay_seed
and io_update_delay
values. There was a mistake in my test code as I forgot to call init()
on the channels to read from EEPROM before printing the values.
Therefore, my current findings show that this Urukul sync issue cannot be replicated on DRTIO master on ARTIQ-7 or ARTIQ-6.
With ARTIQ-7 and ARTIQ-6 I have tried your code on a DRTIO master setup with a single Urukul card, but I have never seen absurd variance in terms of phase difference across power cycles and reboots.
Which channels did you test? It seems there is an issue with ch0 which does not have deterministic phase wrt the others (also on non-DRTIO systems). @Spaqin
With ARTIQ-7 and ARTIQ-6 I have tried your code on a DRTIO master setup with a single Urukul card, but I have never seen absurd variance in terms of phase difference across power cycles and reboots.
Which channels did you test? It seems there is an issue with ch0 which does not have deterministic phase wrt the others (also on non-DRTIO systems). @Spaqin
I did test the phase stability involving Ch0 of a single Urukul AD9910 card (e.g. between Ch0 and Ch1, between Ch0 and Ch2), which is configured on a Kasli 2.0 DRTIO master.
In my previous comment, I mentioned the worst standard deviation (std) I have seen was ~0.01 rad, but this applied only to @airwoodix's original code where the two Urukul channels (which involves Ch0) are offset by 180°. I also separately tested with 0° offset involving Ch0, but the std is significantly lower at ~0.001 rad.
I have redone the tests today, and I could not reproduce any issues with phase. Tested it with a system that has 3 AD9910s and a Kasli 2.0; on both ARTIQ 6 and ARTIQ 7. With the system configured both as DRTIO Master, and as Standalone.
Once calibrated, using the simple experiment code from OP (and even modifying it to include more channels across all three cards), I got very consistent results, even after power cycling. To be exact: two cards had very close phases, third one was using longer cables and thus had a significant phase shift, but still consistent across the reboots.
Which is weird - I could've sworn we couldn't get consistent results last week with ch0... All I did is tightened the screws?
Also, red light on ch0 is more like a red herring. It pops up after set_mu
function is called. It causes a SYNC_SMP_ERROR
- PLL_LOCK is at 1 at all times. Modifying the AD9910 code to clear smp flag after that causes the light to go green again and no other issues pop up. However, the code is the same for all channels. I could not figure out why only first channel was affected. It may be a bug with communication with the CPLD as @HarryMakes mentioned - but other than being slightly misleading, the red LED does not change anything in the behavior.
Previously I was worried that the calibration data was lost or incorrect and on multiple power cycles it would cause a shift. The phase shift between power cycles, on both cold and hot devices, is constant. But that's the key word here - between power cycles. So, with the same system as previously (3 Urukul AD9910 cards, Kasli 2.0), on ARTIQ 6, when I do the following, I get consistent results:
- Power on the system
- Run the test code (no calibration)
- Verify phase shift
- Power off the system
- Repeat 1-4
Again, phase differences seem to be identical every time. That is true for both DRTIO Master and Standalone configurations.
However, I managed to reproduce the issue - it pops up within a power cycle, e.g.
- Power on the system
- Run the test code (no calibration)
- Verify phase shift
- Repeat 2-3 ...
Phase shift may differ between each time, but only on DRTIO Masters - Standalone systems seem not to be affected. Running either cpld init or channel init more than once within one powerup may change the phase shift. Still not sure why exactly, but I can reproduce the issue more reliably.
Thanks @Spaqin and @HarryMakes for the investigation! Sorry for the very, very delayed feedback.
I got back into this because of issues observed on standalone systems with Urukul v1.5 and v1.5.1 (ARTIQ v7.0.b02abc2.beta, Kasli v1.1). The observation is the same as that of @Spaqin (changes in the intra-card phase offset without one power cycle) except that it systematically fails lock the urukul2_ch0
(v1.5.1) PLL:
urukul1: lock=11, smp_err=00, sync_sel=0, proto_rev=8
urukul2: lock=10, smp_err=01, sync_sel=0, proto_rev=8
Experiment code (urukul1
is v1.3, urukul2
is v1.5.1):
from artiq.experiment import *
from artiq.coredevice import ad9910, urukul
class TwoPulses(EnvExperiment):
def build(self):
self.setattr_device("core")
self.ddses = [
# urukul1 is v1.3
self.get_device("urukul1_ch0"),
self.get_device("urukul1_ch1"),
# urukul 2 is v1.5
self.get_device("urukul2_ch0"),
self.get_device("urukul2_ch1"),
]
self.trigger = self.get_device("ttl4")
self.urukul_status = [0, 0]
@kernel
def run(self):
self.core.reset()
for dds in self.ddses:
dds.cpld.init()
dds.init()
dds.set_att(10.0 * dB)
dds.set(180.0 * MHz, phase_mode=ad9910.PHASE_MODE_TRACKING)
self.trigger.pulse(1.0 * us)
t_pulse = now_mu()
for dds in self.ddses:
at_mu(t_pulse)
dds.sw.pulse(10.0 * us)
self.store_urukul_status()
@kernel
def store_urukul_status(self):
for i in range(2):
cpld = self.ddses[2 * i].cpld
self.urukul_status[i] = cpld.sta_read()
delay(10 * us)
def analyze(self):
for i, sta in enumerate(self.urukul_status):
lock = urukul.urukul_sta_pll_lock(sta) & 3
smp_err = urukul.urukul_sta_smp_err(sta) & 3
proto_rev = urukul.urukul_sta_proto_rev(sta)
sync_sel = (self.ddses[i * 2].cpld.cfg_reg >> urukul.CFG_SYNC_SEL) & 1
print(f"urukul{i+1}: {lock=:02b}, {smp_err=:02b}, {sync_sel=}, {proto_rev=}")
Relevant part of device_db
(same for urukul2_cpld
):
device_db["urukul1_cpld"] = {
"type": "local",
"module": "artiq.coredevice.urukul",
"class": "CPLD",
"arguments": {
"spi_device": "spi_urukul1",
"sync_device": "ttl_urukul1_sync",
"io_update_device": "ttl_urukul1_io_update",
"refclk": 125000000.0,
"clk_sel": 2
}
}
WIth same-length cables, the scope picture looks like:
The phase offset between urukul1_ch0
(CH1) / urukul1_ch1
(CH2) and urukul2_ch0
(CH3) is always the same over repeated calls to artiq_run
(no power cycle, no reboot) but not that between urukul2_ch0
(CH3) and urukul2_ch1
(CH4) which oscillates with unstable period between the previous configuration and this one:
The non-deterministic phase between channels (intra- and inter-Urukul) is very problematic.
- Could it be a hardware issue @gkasprow ? I have observed this on at least three different Urukul v1.4 and v1.5 boards.
- Could it be an issue with the CPLD gateware @jordens ? The v1.4 boards in the first post were flashed manually, those in this message by Creotech.
- Has this been reported by other users?
Thanks!
Could it be an issue with the CPLD gateware @jordens ? The v1.4 boards in the first post were flashed manually, those in this message by Creotech.
Seems unlikely but it's certainly not impossible.