Context-aware dynamical decoupling
Summary
Implement context-aware dynamical decoupling (DD) from arXiv:2403.06852.
Details and comments
When selecting decoupling sequences for an idle qubit, context-aware DD takes into account what actions are occuring on neighboring qubits. It ensures that neighboring sequences are mutually orthogonal (to avoid collisions, like in staggered DD) but also whether the qubit is a CX or ECR target/control spectator. This is nicely summarized in a figure of the paper, where the different wire colors denote different decoupling sequences:
This particularly leads to improvement if a lot of qubits are idle at the same time or if they are next to ECR/CX gates. One important category of circuits in this class are the layers in twirled circuits. This pass also works best if the ECR/CX gate times are the same for all connections, which is almost always the case on IBM devices.
Some simple experiments show that this pass performs better than the current standard in Qiskit, PadDynamicaDecoupling with an X-X sequence. Here's some basic benchmarks, see the paper for more:
This code is based off @kdk and @ajavadia 🙂
One or more of the following people are relevant to this code:
@Qiskit/terra-core
Pull Request Test Coverage Report for Build 15410851778
Warning: This coverage report may be inaccurate.
This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
- For more information on this, see Tracking coverage changes with pull request builds.
- To avoid this issue with future PRs, see these Recommended CI Configurations.
- For a quick fix, rebase this PR at GitHub. Your next report should be accurate.
Details
- 356 of 381 (93.44%) changed or added relevant lines in 4 files are covered.
- 1208 unchanged lines in 26 files lost coverage.
- Overall coverage increased (+0.04%) to 87.874%
| Changes Missing Coverage | Covered Lines | Changed/Added Lines | % |
|---|---|---|---|
| qiskit/transpiler/passes/scheduling/padding/context_aware_dynamical_decoupling.py | 353 | 378 | 93.39% |
| <!-- | Total: | 356 | 381 |
| Files with Coverage Reduction | New Missed Lines | % |
|---|---|---|
| qiskit/circuit/library/pauli_evolution.py | 2 | 96.49% |
| qiskit/result/sampled_expval.py | 2 | 93.55% |
| crates/qasm2/src/lex.rs | 3 | 92.48% |
| qiskit/transpiler/passes/routing/sabre_swap.py | 3 | 96.97% |
| crates/cext/src/exit_codes.rs | 4 | 0.0% |
| qiskit/synthesis/evolution/suzuki_trotter.py | 4 | 90.38% |
| qiskit/synthesis/multi_controlled/mcx_synthesis.py | 4 | 98.55% |
| crates/transpiler/src/passes/split_2q_unitaries.rs | 5 | 96.62% |
| qiskit/synthesis/evolution/qdrift.py | 5 | 86.49% |
| crates/accelerate/src/twirling.rs | 6 | 97.33% |
| <!-- | Total: | 1208 |
| Totals | |
|---|---|
| Change from base Build 15300566108: | 0.04% |
| Covered Lines: | 80816 |
| Relevant Lines: | 91968 |
💛 - Coveralls
Anything holding this up from being merged? Happy to lend a hand if needed to speed up the process @Cryoris @eliarbel
https://github.com/Qiskit/qiskit/pull/12825#pullrequestreview-2404327735 is a very good catch -- the newly added gates that implement the DD sequences didn't have a duration attached, since they weren't present when the scheduling pass was run. I'll add a test to check for this 🙂
After discussion with @ElePT I also updated the default behavior of min_joinable_duration: On realistic devices, the 2q gate lengths vary a little -- yet might not want to insert DD sequences for two concurrent 2q gates, just because they naturally differ slightly. The new default is therefore that we only add DD sequences for idle times beyond 2 * (max_2q_duration - min_2q_duration). This needs some benchmarking on real devices before merging.
#12825 (review) is a very good catch -- the newly added gates that implement the DD sequences didn't have a
durationattached, since they weren't present when the scheduling pass was run. I'll add a test to check for this 🙂
Just so you know I'm working on removing per instruction durations in #13506 so we probably don't need to change this now. It would have been important for 1.3, but in case 2.0 we just making extra work for both of us.
I removed any setting and usage of Instruction.duration in 21c4ea0, so hopefully you don't have to update the context-aware DD in #13506 🙂 That means the timeline drawer will fail now, but I assume this will be fixed in your PR 🙂
After discussion with @ElePT I also updated the default behavior of
min_joinable_duration: On realistic devices, the 2q gate lengths vary a little -- yet might not want to insert DD sequences for two concurrent 2q gates, just because they naturally differ slightly. The new default is therefore that we only add DD sequences for idle times beyond2 * (max_2q_duration - min_2q_duration). This needs some benchmarking on real devices before merging.
Hi @Cryoris are you planning to benchmark the new min_joinable_duration default on real devices before the 2.1 release?
Yep, it's good from my side now 🙂 The number of X gates is rather large though, per design of the algorithm, I think that was just surprising to see.
E.g. for an N-local circuit with linear entanglement on n qubits (not that someone would run this), the number of X gates goes with O(n^2) instead of O(n) for "normal" DD. But the results of CA-DD look promising on the eagle devices and the overhead can be controlled with the min_joinable_duration parameter.
Overall this looks in good shape to me both from code and testing perspectives. I haven't reviewed the algorithm in depth against the paper, but I ran several examples and the results look OK. Still, I'd like to look into some of the algorithmic aspects a bit more. In the meantime, I have a couple questions and possible suggestions, both not major:
The rationale behind the default value for min_joinable_duration is not so clear to me. Also, it doesn't feel very intuitive as a knob for controlling the amount of DD inserted into the circuit. Is there some underlying intuition behind the choice, something that could be added to the docstring to help clarify this a bit?
'cz' is included in the calculation of the 2q gate length variance in _gate_length_variance. However, the logic for handling spectator qubits does not account for CZGates. Was this intentional, perhaps due to the differences in crosstalk levels between Heron and Eagle devices? If so, this introduces a technology assumption in the implementation, isn't it? I'm wondering if it would make sense to provide a way for users to opt in or out of spectator qubit analysis for specific gate types?
Re: min_joinable_duration -- this shouldn't set the amount of DD inserted, but rather control how adjacent idle blocks are joined. It's a knob that lets you trade-off between potentially many short sequences or longer blocks. The default value is picked to ensure that a layer of CX/ECR will not have a lot of short DD sequences just due to the difference in duration. But this is not discussed in the paper afaik and I'm happy to change this if you think there's a more reasonable choice 🙂
Re: CZ -- The paper only investigates CX/ECR. CZ would probably require a different spectator qubit handling, but this is not explained in the paper and hence not supported in the code. We can add an argument that allows to turn off the spectator-specific logic for CX/ECR.
Re: min_joinable_duration -- this shouldn't set the amount of DD inserted, but rather control how adjacent idle blocks are joined. It's a knob that lets you trade-off between potentially many short sequences or longer blocks. The default value is picked to ensure that a layer of CX/ECR will not have a lot of short DD sequences just due to the difference in duration. But this is not discussed in the paper afaik and I'm happy to change this if you think there's a more reasonable choice 🙂
The reason I treated this is as a knob to control the amount of DD is because of this check:
if (
node.op.name == "delay"
and not is_after_reset(node)
and self._duration(node, qubit_map) > self._min_joinable_duration
)
in _get_sorted_delays. It skips durations shorter than min_joinable_duration when collecting eligible delays, effectively becoming a filter of idle blocks based on their lengths. This is because shorter delays are not just omitted from adjacency blocks and have their own DD sequences, they are skipped altogether. If we want a truly min_joinable_duration behavior in the "joinable" sense, we'll probably want to handle the parameter in _collect_adjacent_delay_blocks. As for the value default value itself, I see the reasoning for layers so I think it's OK. But now I think that a name that better reflects how the logic is implemented currently is min_duration 😄
Fair point, let's rename it 👍🏻