artiq icon indicating copy to clipboard operation
artiq copied to clipboard

Long range DRTIO: timeout attempting to get remote buffer space

Open marmeladapk opened this issue 2 years ago • 6 comments

Bug Report

One-Line Summary

I connected two Kaslis with a long fiber (~20 km master->satellite, ~25 km satellite->master) and while the link setups mostly correctly I get this error: timeout attempting to get remote buffer space.

Issue Details

AFAICT the link works fine and I can run an experiment which changes state on DIO on both Kaslis. Latency between edges on DIO is around 98.3 us. Only when I try to use loops to introduce more events then I get write overflow on satellite and timeout attempting to get remote buffer space on master.

Master logs (setup):

[    13.936935s]  INFO(runtime::rtio_mgt::drtio): [DEST#1] destination is up
[    13.943128s]  INFO(runtime::rtio_mgt::drtio): [DEST#1] buffer space is 0
[    14.150352s] ERROR(runtime::rtio_mgt::drtio): [LINK#0] error(s) found (0x04):
[    14.156203s] ERROR(runtime::rtio_mgt::drtio): [LINK#0] timeout attempting to get remote buffer space

On satellite nothing seems out of the ordinary until I run an experiment with more events.

Both fibers have been tested with IBERT to check if digital communication works.

Can timeout be increased? Can it have some negative effects in other areas?

Paging @dhslichter, as he was also interested in long range DRTIO.

Your System (omit irrelevant parts)

  • Operating System: Linux
  • ARTIQ version: 5985595
  • Version of the gateware and runtime loaded in the core device: 5985595
  • Hardware involved:
    • 2x Kasli v2.0.1
    • 2x DIO-BNC
    • 20 and 25 km fiber
    • 2x 10G 1550 nm fiber transceiver

marmeladapk avatar Dec 08 '22 11:12 marmeladapk

Why are the fibers asymmetric? Is that the BiDi refractive index difference or are they physically different?

jordens avatar Dec 08 '22 13:12 jordens

@jordens They are physically different. That's what was available to me on a short notice. (and bidi transceivers 1270 nm/1330 nm don't work on these particular fibers and 1550 nm/1490 nm 10G transceivers are stupidly expensive).

marmeladapk avatar Dec 08 '22 17:12 marmeladapk

We are interested in long distance DRTIO but we would use matched-length fibers and matched transceivers for this. With 5 km difference in fiber lengths, I could imagine that would make all kinds of things not work properly...

dhslichter avatar Dec 08 '22 18:12 dhslichter

@dhslichter AFAICT link assymetry shouldn't play any part in here. I didn't see any latency measurements in code (but I also wasn't looking very hard), and I think that both master and satellite care only about round trip time. For this error timeout seems to be hardcoded: rt_controller_master.py#L106.

marmeladapk avatar Dec 08 '22 21:12 marmeladapk

That timeout is 65µs, so yes it looks like it would be exceeded with 20km fiber. But this is the only main problem I can think of, other than fiber delay noise on those lengths and corresponding fluctuations of the satellite clock phase. Try increasing the timeout and recompile the gateware.

1550 nm/1490 nm 10G transceivers are stupidly expensive

Maybe you can ask TOPTICA to make cheaper ones.

If they won't, take a look at DWDM filters e.g. https://www.fs.com/products/70090.html and circulators e.g. https://www.box-laser.com/fiber-optic-circulators ($68).

sbourdeauducq avatar Dec 09 '22 02:12 sbourdeauducq

Increasing the timeout fixed the issue and after a while I didn't see anything wrong in the logs.

marmeladapk avatar Dec 20 '22 16:12 marmeladapk