Recreate full dag instead of inplace substitution in BasisTranslator
Summary
This commit tweaks the internal logic of the basis translator transpiler pass to do a full dag recreation instead of inplace modification. If only a few operations were to be substituted it would probably be more efficient to do an inplace modification, but in general the basis translator ends up replacing far more operations than not. In such cases just iterating over the dag and rebuilding it is more efficient because the overhead of apply_operation_back() is minimal compared to substitute_node_with_dag() (although it's higher than subtitute_node(.., inplace=True)).
Details and comments
One or more of the the following people are requested to review this:
@Qiskit/terra-core
I ran some asv benchmarks for this PR here:
Benchmarks that have improved:
before after ratio
[166dcccd] [595f81bf]
<dag-recreate-bt~1> <dag-recreate-bt>
- 2.07±0.01s 1.88±0.01s 0.91 utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')
- 1.32±0.01s 1.15±0.01s 0.87 utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')
- 603±0.6ms 504±2ms 0.84 passes.MultipleBasisPassBenchmarks.time_basis_translator(20, 1024, ['rz', 'x', 'sx', 'cx', 'id'])
- 435±2ms 360±0.7ms 0.83 passes.MultipleBasisPassBenchmarks.time_basis_translator(14, 1024, ['rz', 'x', 'sx', 'cx', 'id'])
- 510±2ms 418±2ms 0.82 passes.MultipleBasisPassBenchmarks.time_basis_translator(20, 1024, ['u', 'cx', 'id'])
- 366±1ms 299±2ms 0.82 passes.MultipleBasisPassBenchmarks.time_basis_translator(14, 1024, ['u', 'cx', 'id'])
- 167±2ms 135±1ms 0.81 passes.MultipleBasisPassBenchmarks.time_basis_translator(5, 1024, ['rz', 'x', 'sx', 'cx', 'id'])
- 142±0.5ms 113±0.8ms 0.80 passes.MultipleBasisPassBenchmarks.time_basis_translator(5, 1024, ['u', 'cx', 'id'])
Benchmarks that have stayed the same:
before after ratio
[166dcccd] [595f81bf]
<dag-recreate-bt~1> <dag-recreate-bt>
n/a n/a n/a quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(1, 'synthesis')
n/a n/a n/a quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(14, 'synthesis')
n/a n/a n/a quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(2, 'synthesis')
n/a n/a n/a quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(20, 'synthesis')
n/a n/a n/a quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(27, 'synthesis')
n/a n/a n/a quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(3, 'synthesis')
n/a n/a n/a quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(5, 'synthesis')
n/a n/a n/a quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(8, 'synthesis')
6.77±0.5ms 7.42±0.6ms 1.10 quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(3, 'translator')
128±0.2ms 139±0.5ms 1.08 transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(3)
130±0.2ms 139±0.2ms 1.08 transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(3)
114±0.3ms 123±0.4ms 1.08 transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(1)
117±0.2ms 125±0.3ms 1.07 transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(1)
15.9±0.1ms 17.0±0.3ms 1.07 quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(5, 'translator')
41.4±0.3ms 43.7±0.2ms 1.06 quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(8, 'translator')
243±0.7ms 256±2ms 1.05 quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(20, 'translator')
5.19±0.04ms 5.45±0.1ms 1.05 quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(2, 'translator')
2.45±0.03ms 2.57±0.03ms 1.05 quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(1, 'translator')
432±2ms 450±6ms 1.04 quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(27, 'translator')
229±2ms 238±4ms 1.04 transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(2)
121±0.7ms 126±1ms 1.04 quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(14, 'translator')
227±0.9ms 235±0.7ms 1.04 transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(2)
85.2±0.3ms 87.6±0.5ms 1.03 transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(1)
105±1ms 107±0.4ms 1.03 transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(1)
97.2±0.7ms 99.7±0.5ms 1.02 transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(0)
262±2ms 267±2ms 1.02 transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(2)
402±0.7ms 406±1ms 1.01 transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(1)
92.6±0.3ms 93.7±0.3ms 1.01 utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')
30.0±0.4ms 30.3±0.2ms 1.01 utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')
122±0.5ms 122±0.4ms 1.00 transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(0)
92.7±0.3ms 92.8±0.3ms 1.00 utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')
30.1±0.3ms 30.2±0.3ms 1.00 utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr')
20.7±0.03s 20.7±0s 1.00 utility_scale.UtilityScaleBenchmarks.time_qft('ecr')
2565 2565 1.00 transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(0)
1403 1403 1.00 transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(1)
1403 1403 1.00 transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(2)
1296 1296 1.00 transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(3)
2705 2705 1.00 transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(0)
2005 2005 1.00 transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(1)
2005 2005 1.00 transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(2)
7 7 1.00 transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(3)
2705 2705 1.00 transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(0)
2005 2005 1.00 transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(1)
2005 2005 1.00 transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(2)
7 7 1.00 transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(3)
323 323 1.00 transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(0)
336 336 1.00 transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(1)
336 336 1.00 transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(2)
272 272 1.00 transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(3)
1908 1908 1.00 utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cx')
1896 1896 1.00 utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cz')
1936 1936 1.00 utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('ecr')
2598 2598 1.00 utility_scale.UtilityScaleBenchmarks.track_qft_depth('cx')
2598 2598 1.00 utility_scale.UtilityScaleBenchmarks.track_qft_depth('cz')
2598 2598 1.00 utility_scale.UtilityScaleBenchmarks.track_qft_depth('ecr')
972 972 1.00 utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cx')
972 972 1.00 utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cz')
972 972 1.00 utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('ecr')
296±3ms 295±1ms 1.00 transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(3)
1.86±0.01s 1.86±0.01s 1.00 transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(3)
4.93±0s 4.93±0.01s 1.00 utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')
30.3±0.5ms 30.3±0.07ms 1.00 utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')
1.53±0.01s 1.52±0s 1.00 transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(2)
18.5±0.02s 18.4±0.05s 0.99 utility_scale.UtilityScaleBenchmarks.time_qft('cx')
8.56±0.04ms 8.50±0.05ms 0.99 utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')
93.0±0.1ms 92.2±0.2ms 0.99 utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')
2.62±0s 2.60±0.03s 0.99 utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')
8.64±0.06ms 8.55±0.04ms 0.99 utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')
8.64±0.07ms 8.54±0.05ms 0.99 utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')
23.8±0.04s 23.5±0.09s 0.99 utility_scale.UtilityScaleBenchmarks.time_qft('cz')
896±6ms 882±7ms 0.98 transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(0)
817±3ms 802±2ms 0.98 passes.MultipleBasisPassBenchmarks.time_basis_translator(20, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id'])
582±2ms 563±5ms 0.97 passes.MultipleBasisPassBenchmarks.time_basis_translator(14, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id'])
810±3ms 777±4ms 0.96 utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')
208±0.8ms 196±0.6ms 0.94 passes.MultipleBasisPassBenchmarks.time_basis_translator(5, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id'])
2.96±0.02s 2.76±0.03s 0.93 utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')
Benchmarks that have got worse:
before after ratio
[166dcccd] [595f81bf]
<dag-recreate-bt~1> <dag-recreate-bt>
+ 54.3±0.2ms 63.6±0.4ms 1.17 transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(0)
+ 57.0±0.2ms 65.7±0.2ms 1.15 transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(0)
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.
The large_qasm benchmarks are the worst case because the circuit is relatively deep and in the target basis already. If this is a concern we can add a logic switch based on the number of substations needed. From https://github.com/Qiskit/qiskit/pull/12179#issuecomment-2054119816 the crossover point is ~2% of the gates in a circuit will be replaced. Not sure if the performance will be exactly the same for this pass though, it might be worth benchmarking for the crossover point here too.
Another small comment is that the circuit drawer unit test is acting out.
Weird the output circuit from the transpiler of an if statement's body is shifting from rz(pi) to rz(-pi) with a global phase of pi. So both outputs are equivalent but the visualization changes and because we don't display global phase inside control flow bodies it looks a bit weird now. I'm not sure what is causing this though because I didn't change the basis translators logic, just how we construct the output I wouldn't have expected any changes.
I pushed this out to 1.2.0 since I haven't had a chance to fix the root cause of the test failure. That being said this might be superseded by #12246
I tried fixing the latest conflicts and proposing a solution for the failed test. To make the code work after merging the changes from #12705 I ended up applying a few hacky-looking fixes (such as re-writing qargs and cargs in new_node or this change in L381-383), there might be better workarounds. The test fix is nothing fancy, I just return the old boolean flag together with the dag in apply_translation.
Pull Request Test Coverage Report for Build 10148007745
Details
- 32 of 32 (100.0%) changed or added relevant lines in 1 file are covered.
- 5 unchanged lines in 1 file lost coverage.
- Overall coverage increased (+0.02%) to 89.773%
| Files with Coverage Reduction | New Missed Lines | % |
|---|---|---|
| crates/qasm2/src/lex.rs | 5 | 92.11% |
| <!-- | Total: | 5 |
| Totals | |
|---|---|
| Change from base Build 10147820667: | 0.02% |
| Covered Lines: | 66970 |
| Relevant Lines: | 74599 |
💛 - Coveralls
I ran the benchmarks after @ElePT's fix and rebase:
Benchmarks that have improved:
| Change | Before [b23c5452] | After [94d6b9c7] <dag-recreate-bt> | Ratio | Benchmark (Parameter) |
|----------|----------------------|--------------------------------------|---------|--------------------------------------------------------------------|
| - | 1.91±0.02s | 1.55±0.01s | 0.81 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz') |
| - | 1.77±0.01s | 1.38±0.01s | 0.78 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr') |
| - | 1.32±0.01s | 995±6ms | 0.76 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx') |
Benchmarks that have stayed the same:
| Change | Before [b23c5452] | After [94d6b9c7] <dag-recreate-bt> | Ratio | Benchmark (Parameter) |
|----------|----------------------|--------------------------------------|---------|-------------------------------------------------------------------------------|
| | 0 | 0 | n/a | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('cx') |
| | 0 | 0 | n/a | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('cz') |
| | 0 | 0 | n/a | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('ecr') |
| | 107±1ms | 110±0.3ms | 1.03 | utility_scale.UtilityScaleBenchmarks.time_bvlike('cx') |
| | 3.67±0.01s | 3.73±0.1s | 1.02 | utility_scale.UtilityScaleBenchmarks.time_circSU2('cx') |
| | 11.3±0.02s | 11.5±0.01s | 1.02 | utility_scale.UtilityScaleBenchmarks.time_qv('cz') |
| | 108±1ms | 110±0.6ms | 1.01 | utility_scale.UtilityScaleBenchmarks.time_bvlike('ecr') |
| | 3.79±0.01s | 3.83±0.06s | 1.01 | utility_scale.UtilityScaleBenchmarks.time_circSU2('cz') |
| | 3.78±0.01s | 3.83±0.08s | 1.01 | utility_scale.UtilityScaleBenchmarks.time_circSU2('ecr') |
| | 6.72±0.05s | 6.81±0s | 1.01 | utility_scale.UtilityScaleBenchmarks.time_qv('cx') |
| | 8.54±0.07s | 8.65±0.01s | 1.01 | utility_scale.UtilityScaleBenchmarks.time_qv('ecr') |
| | 332±1ms | 331±2ms | 1.00 | utility_scale.UtilityScaleBenchmarks.time_bv_100('cz') |
| | 109±1ms | 109±0.8ms | 1.00 | utility_scale.UtilityScaleBenchmarks.time_bvlike('cz') |
| | 111±1ms | 111±0.4ms | 1.00 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr') |
| | 19.8±0.01s | 19.8±0.04s | 1.00 | utility_scale.UtilityScaleBenchmarks.time_qft('cx') |
| | 395 | 395 | 1.00 | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cx') |
| | 397 | 397 | 1.00 | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cz') |
| | 397 | 397 | 1.00 | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('ecr') |
| | 300 | 300 | 1.00 | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cx') |
| | 300 | 300 | 1.00 | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cz') |
| | 300 | 300 | 1.00 | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('ecr') |
| | 1483 | 1483 | 1.00 | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cx') |
| | 1488 | 1488 | 1.00 | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cz') |
| | 1488 | 1488 | 1.00 | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('ecr') |
| | 1954 | 1954 | 1.00 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cx') |
| | 1954 | 1954 | 1.00 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cz') |
| | 1954 | 1954 | 1.00 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('ecr') |
| | 2538 | 2538 | 1.00 | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cx') |
| | 2538 | 2538 | 1.00 | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cz') |
| | 2538 | 2538 | 1.00 | utility_scale.UtilityScaleBenchmarks.track_qv_depth('ecr') |
| | 435 | 435 | 1.00 | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cx') |
| | 435 | 435 | 1.00 | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cz') |
| | 435 | 435 | 1.00 | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('ecr') |
| | 306±2ms | 304±0.3ms | 0.99 | utility_scale.UtilityScaleBenchmarks.time_bv_100('cx') |
| | 333±2ms | 331±2ms | 0.99 | utility_scale.UtilityScaleBenchmarks.time_bv_100('ecr') |
| | 22.4±0.1s | 22.3±0.05s | 0.99 | utility_scale.UtilityScaleBenchmarks.time_qft('cz') |
| | 22.4±0.1s | 22.2±0s | 0.99 | utility_scale.UtilityScaleBenchmarks.time_qft('ecr') |
| | 10.4±0.09ms | 10.2±0.1ms | 0.98 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx') |
| | 112±0.8ms | 110±0.6ms | 0.98 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz') |
| | 36.3±0.4ms | 35.5±0.2ms | 0.98 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx') |
| | 36.2±0.3ms | 35.5±0.5ms | 0.98 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz') |
| | 36.5±0.2ms | 35.6±0.5ms | 0.98 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr') |
| | 2.84±0.01s | 2.77±0.02s | 0.98 | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz') |
| | 10.5±0.04ms | 10.2±0.05ms | 0.97 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz') |
| | 10.5±0.05ms | 10.3±0.05ms | 0.97 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr') |
| | 112±0.4ms | 109±1ms | 0.97 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx') |
| | 2.33±0.02s | 2.24±0.01s | 0.96 | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr') |
| | 1.14±0.01s | 1.07±0.01s | 0.94 | utility_scale.UtilityScaleBenchmarks.time_qaoa('cx') |
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.