quilc icon indicating copy to clipboard operation
quilc copied to clipboard

Roadmap to support for higher-order `HARDWARE-OBJECT`s

Open ecpeterson opened this issue 4 years ago • 2 comments

This meta-issue contains a roadmap for implementing various features in the compiler related to HARDWARE-OBJECTs of order ≥ 2, which includes such existing GH Issues as #23 and #118. This list might update and change (1) as we implement some of these features and (2) as discussion proceeds in this thread.

Pre-work

  • Decide on the relative importance of higher order objects in terms of nativeness. Are we always willing to assume that the 1Q + 2Q sub-gatesets are sufficient to be universal? If so, then mapping onto a hardware object that supports a higher-arity gate natively is a niceness / efficiency concern, but not a hard requirement. I'm going to write the rest of this under the assumption that this is so, but I think it would be productive to explore whether there are other sorts of examples.
  • Do some "research" (i.e., talk to Colm) about what configuration of qubits will be used to enact CCNOT in our hardware. My understanding is that we need any 3 qubits connected in a line (and it's not necessary that the endpoint qubits be coupled to one another to form a triangle). It could also be that the outer qubits are required to be both tunable or both fixed. Make sure we're aiming at the right thing.
  • Write a constructor in chip-specification that manufactures a small example chip with higher-order objects.

Work

These threads can be completed separately / in tandem. To ease testing, it would be helpful for the ADDRESSER work to precede the COMPRESSOR work, but this isn't necessary.

  • COMPILER CRAWLER
    • The crawler works under the assumption that 1Q gates are faster and more fidelious than 2Q gates, and so when crawling the graph of 2Q gatesets it doesn't bother tracking any 1Q gates that it might accrue. This is a bad assumption with higher-order objects: our proposed implementation of CCNOT is roughly equivalent (in terms of duration and of fidelity) to a pair of CZs. (While you're in the neighborhood, changing what's discarded and what's kept will touch a lot of the same code as issue #373 about separately counting non-native gates.)
    • WARM-HARDWARE-OBJECTS only warms objects of orders 0 and 1. Make it warm all orders.
  • ADDRESSER: The addresser simultaneously solves the layout, scheduling, and nativization problems. It makes critical assumptions about gate arity in the following cases.
    • The cost function ignores upcoming 1Q gates and weights upcoming 2Q gates by a mixture of their depth in the yet-to-be-scheduled program (i.e., their "immediacy") and how far apart their qubit arguments are in the present assignment to physical hardware (viz., how long it would take to run a SWAP chain from one qubit to the other). It entirely ignores nQ gates. The cost function will have to be updated to take nQ gates into account, and an appropriate value to assign to such a higher-arity gate is not clear to me. (It's good to have all n qubits together on a line; better to have them even more connected to each other; best to have them actually sitting on an order (n-1) hardware object for which they're native.)
    • Addresser to support the existence of higher-order objects. Currently when it encounters a higher-arity gate, it immediately tries to decompose it into lower-arity gates. It should instead check: are these qubits mapped (or perhaps freshly assignable?) onto physical qubits associated to a higher-order hardware object for which this is a native instruction?
    • Addresser then needs to be able to decide: Is it more advantageous to start decomposing a nQ gate as-is into lower arity gates, or should it permute qubits to arrange (some subset of?) them to lie on a higher-order hardware object? There was previously only one thing to do with an nQ gate: start to decompose it 'til it consists of 1Q+2Q gates so that we can start to consider the resulting layout problem. Now we have to decide when to pursue a layout problem directly vs when to decompose.
    • There is some logic around resource exclusion to update. The correct general statement is that a hardware object is free whenever every hardware object it intersects is also free. The special cases of this condition for qubits and links are manually implemented.
  • COMPRESSOR: The compressor associates a GOVERENED-QUEUE to each qubit and link and additionally has a 'global queue' for overflow.
    • Actually associate a GOVERNED-QUEUE to each HARDWARE-OBJECT, including those of higher order.
    • Go through the state change logic for GOVERNED-QUEUEs. It definitely makes assumptions about links being the largest objects: for instance, when a link starts QUEUEING, it sets both of its child qubits to FORWARDING. These rules will have to be generalized to speak of child objects, intersecting objects, ... .
    • The logic for when the global queue gets flushed might have to change. The current logic is that a flush happens whenever the contents of the global queue uses a larger collection of resources than some fixed number (I think 2 * (qubits on an edge) + 1). This value could perhaps be more flexible; otherwise a higher-order object's contribution to the global queue could immediately trip a flush, and it's not clear to me that that's the best behavior.

Post-work

  • Augment chip-reader's ISA format to permit higher order objects.
  • A 3Q permutation gate can be written as a pair of SWAPs (= roughly 6 CZs). Is there a more efficient decomposition of a 3Q permutation gate using CCNOT? If so, we may want to add PERMUTATION-RECORDs to a higher-order hardware object and teach the addresser to actually use them instead of assuming that only qubit-pair SWAPs are present. (This will be a serious task on its own, and it can be left for after the work described above.)
  • Implement some rewriting rules for higher-order hardware objects, just to check that it's actually working as one might expect. For instance, RZs commute past a CCNOT on its control qubits.

ecpeterson avatar Sep 27 '19 15:09 ecpeterson

Background reading: https://www.youtube.com/watch?v=H4v7wddN-Wg

notmgsk avatar Oct 03 '19 19:10 notmgsk

I wanted to add a few distinct facets and considerations to this problem of implementing higher order hardware objects (HOHOs) that motivate separate lines of work/goals.

HOHOs as Manual Optimizations

Sometimes a HOHO is really just a way to manually optimize. Probably no architecture is going to expose CCNOT without exposing a high-quality universal(-contributing) 2Q gate like CNOT. Similar to a native SWAP, CCNOT is just coming along for the ride, and gives the user more options for writing efficient programs. But the compiler doesn't actually need to do anything with them, except schedule them in the output of the program.

HOHOs as Native Targets

Some quantum computing architectures, like neutral atoms or ions, use and require parametric HOHO operations natively. A parametric Molmer-Sorensen is an example. Typically these architectures have patterns like:

SOME-TWO-Q-GATE = P MS(a) Q MS(b) R

where P, Q, R are products of 1q interactions and MS are parametric Molmer-Sorensens.

One of the ways to "get around this" without implementing N-Q gates in QUILC is to just implement a universal 2Q gate set that expands into fixed templates of MS gates. But this possibly loses potential for optimization along various axes. (This is the same issue as building a compiler that only knows CNOT, and then after expanding CNOT into native instructions blindly via a library of templates.)

HOHOs as a Front for Ancilla Operations

In the last example, HOHOs of dimension 2^n aren't all that useful for implementing 2^n-dimensional unitaries directly, except for perhaps a specific set of state preparations. Instead, they're useful as a generic way to achieve 2 or 3Q operations using a relatively flexible template that uses more qubits.

In essence, this process is truly more about making use of ancilla qubits: qubits that participate in the computation but whose state doesn't change by the end (in the case of Molmer-Sorensen echoing) or is disentangled & thrown out altogether.

The challenge in implementing HOHOs thus "reduces" to a challenge of supporting ancilla qubits in QUILC, which mostly turns into an addressing and fidelity calculation challenge.

Synthesis into HOHOs and Compression Know-How

Another facet to the problem of HOHOs is building knowledge that allows QUILC to optimize HOHO operations, nativize smaller arity operations into HOHO operations (if fidelity calls for it), etc. I call this out separately because I think there's different heavy lifting to do with the "addressing" side (getting operations placed, even when they're native), and with "synthesis/optimization side" (profitably using HOHOs for operations that aren't expressed in terms of them).

stylewarning avatar Dec 02 '21 00:12 stylewarning