qiskit icon indicating copy to clipboard operation
qiskit copied to clipboard

Recreate full dag instead of inplace substitution in BasisTranslator

Open mtreinish opened this issue 1 year ago • 4 comments

Summary

This commit tweaks the internal logic of the basis translator transpiler pass to do a full dag recreation instead of inplace modification. If only a few operations were to be substituted it would probably be more efficient to do an inplace modification, but in general the basis translator ends up replacing far more operations than not. In such cases just iterating over the dag and rebuilding it is more efficient because the overhead of apply_operation_back() is minimal compared to substitute_node_with_dag() (although it's higher than subtitute_node(.., inplace=True)).

Details and comments

mtreinish avatar Apr 17 '24 01:04 mtreinish

One or more of the the following people are requested to review this:

  • @Qiskit/terra-core

qiskit-bot avatar Apr 17 '24 01:04 qiskit-bot

I ran some asv benchmarks for this PR here:

Benchmarks that have improved:

       before           after         ratio
     [166dcccd]       [595f81bf]
     <dag-recreate-bt~1>       <dag-recreate-bt>
-      2.07±0.01s       1.88±0.01s     0.91  utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')
-      1.32±0.01s       1.15±0.01s     0.87  utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')
-       603±0.6ms          504±2ms     0.84  passes.MultipleBasisPassBenchmarks.time_basis_translator(20, 1024, ['rz', 'x', 'sx', 'cx', 'id'])
-         435±2ms        360±0.7ms     0.83  passes.MultipleBasisPassBenchmarks.time_basis_translator(14, 1024, ['rz', 'x', 'sx', 'cx', 'id'])
-         510±2ms          418±2ms     0.82  passes.MultipleBasisPassBenchmarks.time_basis_translator(20, 1024, ['u', 'cx', 'id'])
-         366±1ms          299±2ms     0.82  passes.MultipleBasisPassBenchmarks.time_basis_translator(14, 1024, ['u', 'cx', 'id'])
-         167±2ms          135±1ms     0.81  passes.MultipleBasisPassBenchmarks.time_basis_translator(5, 1024, ['rz', 'x', 'sx', 'cx', 'id'])
-       142±0.5ms        113±0.8ms     0.80  passes.MultipleBasisPassBenchmarks.time_basis_translator(5, 1024, ['u', 'cx', 'id'])

Benchmarks that have stayed the same:

       before           after         ratio
     [166dcccd]       [595f81bf]
     <dag-recreate-bt~1>       <dag-recreate-bt>
              n/a              n/a      n/a  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(1, 'synthesis')
              n/a              n/a      n/a  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(14, 'synthesis')
              n/a              n/a      n/a  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(2, 'synthesis')
              n/a              n/a      n/a  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(20, 'synthesis')
              n/a              n/a      n/a  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(27, 'synthesis')
              n/a              n/a      n/a  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(3, 'synthesis')
              n/a              n/a      n/a  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(5, 'synthesis')
              n/a              n/a      n/a  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(8, 'synthesis')
       6.77±0.5ms       7.42±0.6ms     1.10  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(3, 'translator')
        128±0.2ms        139±0.5ms     1.08  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(3)
        130±0.2ms        139±0.2ms     1.08  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(3)
        114±0.3ms        123±0.4ms     1.08  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(1)
        117±0.2ms        125±0.3ms     1.07  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(1)
       15.9±0.1ms       17.0±0.3ms     1.07  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(5, 'translator')
       41.4±0.3ms       43.7±0.2ms     1.06  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(8, 'translator')
        243±0.7ms          256±2ms     1.05  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(20, 'translator')
      5.19±0.04ms       5.45±0.1ms     1.05  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(2, 'translator')
      2.45±0.03ms      2.57±0.03ms     1.05  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(1, 'translator')
          432±2ms          450±6ms     1.04  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(27, 'translator')
          229±2ms          238±4ms     1.04  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(2)
        121±0.7ms          126±1ms     1.04  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(14, 'translator')
        227±0.9ms        235±0.7ms     1.04  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(2)
       85.2±0.3ms       87.6±0.5ms     1.03  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(1)
          105±1ms        107±0.4ms     1.03  transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(1)
       97.2±0.7ms       99.7±0.5ms     1.02  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(0)
          262±2ms          267±2ms     1.02  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(2)
        402±0.7ms          406±1ms     1.01  transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(1)
       92.6±0.3ms       93.7±0.3ms     1.01  utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')
       30.0±0.4ms       30.3±0.2ms     1.01  utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')
        122±0.5ms        122±0.4ms     1.00  transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(0)
       92.7±0.3ms       92.8±0.3ms     1.00  utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')
       30.1±0.3ms       30.2±0.3ms     1.00  utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr')
       20.7±0.03s          20.7±0s     1.00  utility_scale.UtilityScaleBenchmarks.time_qft('ecr')
             2565             2565     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(0)
             1403             1403     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(1)
             1403             1403     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(2)
             1296             1296     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(3)
             2705             2705     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(0)
             2005             2005     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(1)
             2005             2005     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(2)
                7                7     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(3)
             2705             2705     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(0)
             2005             2005     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(1)
             2005             2005     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(2)
                7                7     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(3)
              323              323     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(0)
              336              336     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(1)
              336              336     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(2)
              272              272     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(3)
             1908             1908     1.00  utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cx')
             1896             1896     1.00  utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cz')
             1936             1936     1.00  utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('ecr')
             2598             2598     1.00  utility_scale.UtilityScaleBenchmarks.track_qft_depth('cx')
             2598             2598     1.00  utility_scale.UtilityScaleBenchmarks.track_qft_depth('cz')
             2598             2598     1.00  utility_scale.UtilityScaleBenchmarks.track_qft_depth('ecr')
              972              972     1.00  utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cx')
              972              972     1.00  utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cz')
              972              972     1.00  utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('ecr')
          296±3ms          295±1ms     1.00  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(3)
       1.86±0.01s       1.86±0.01s     1.00  transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(3)
          4.93±0s       4.93±0.01s     1.00  utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')
       30.3±0.5ms      30.3±0.07ms     1.00  utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')
       1.53±0.01s          1.52±0s     1.00  transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(2)
       18.5±0.02s       18.4±0.05s     0.99  utility_scale.UtilityScaleBenchmarks.time_qft('cx')
      8.56±0.04ms      8.50±0.05ms     0.99  utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')
       93.0±0.1ms       92.2±0.2ms     0.99  utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')
          2.62±0s       2.60±0.03s     0.99  utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')
      8.64±0.06ms      8.55±0.04ms     0.99  utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')
      8.64±0.07ms      8.54±0.05ms     0.99  utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')
       23.8±0.04s       23.5±0.09s     0.99  utility_scale.UtilityScaleBenchmarks.time_qft('cz')
          896±6ms          882±7ms     0.98  transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(0)
          817±3ms          802±2ms     0.98  passes.MultipleBasisPassBenchmarks.time_basis_translator(20, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id'])
          582±2ms          563±5ms     0.97  passes.MultipleBasisPassBenchmarks.time_basis_translator(14, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id'])
          810±3ms          777±4ms     0.96  utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')
        208±0.8ms        196±0.6ms     0.94  passes.MultipleBasisPassBenchmarks.time_basis_translator(5, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id'])
       2.96±0.02s       2.76±0.03s     0.93  utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')

Benchmarks that have got worse:

       before           after         ratio
     [166dcccd]       [595f81bf]
     <dag-recreate-bt~1>       <dag-recreate-bt>
+      54.3±0.2ms       63.6±0.4ms     1.17  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(0)
+      57.0±0.2ms       65.7±0.2ms     1.15  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(0)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

The large_qasm benchmarks are the worst case because the circuit is relatively deep and in the target basis already. If this is a concern we can add a logic switch based on the number of substations needed. From https://github.com/Qiskit/qiskit/pull/12179#issuecomment-2054119816 the crossover point is ~2% of the gates in a circuit will be replaced. Not sure if the performance will be exactly the same for this pass though, it might be worth benchmarking for the crossover point here too.

mtreinish avatar Apr 17 '24 01:04 mtreinish

Another small comment is that the circuit drawer unit test is acting out.

Weird the output circuit from the transpiler of an if statement's body is shifting from rz(pi) to rz(-pi) with a global phase of pi. So both outputs are equivalent but the visualization changes and because we don't display global phase inside control flow bodies it looks a bit weird now. I'm not sure what is causing this though because I didn't change the basis translators logic, just how we construct the output I wouldn't have expected any changes.

mtreinish avatar Apr 25 '24 18:04 mtreinish

I pushed this out to 1.2.0 since I haven't had a chance to fix the root cause of the test failure. That being said this might be superseded by #12246

mtreinish avatar May 02 '24 11:05 mtreinish

I tried fixing the latest conflicts and proposing a solution for the failed test. To make the code work after merging the changes from #12705 I ended up applying a few hacky-looking fixes (such as re-writing qargs and cargs in new_node or this change in L381-383), there might be better workarounds. The test fix is nothing fancy, I just return the old boolean flag together with the dag in apply_translation.

ElePT avatar Jul 29 '24 16:07 ElePT

Pull Request Test Coverage Report for Build 10148007745

Details

  • 32 of 32 (100.0%) changed or added relevant lines in 1 file are covered.
  • 5 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.02%) to 89.773%

Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/lex.rs 5 92.11%
<!-- Total: 5
Totals Coverage Status
Change from base Build 10147820667: 0.02%
Covered Lines: 66970
Relevant Lines: 74599

💛 - Coveralls

coveralls avatar Jul 29 '24 16:07 coveralls

I ran the benchmarks after @ElePT's fix and rebase:

Benchmarks that have improved:

| Change   | Before [b23c5452]    | After [94d6b9c7] <dag-recreate-bt>   |   Ratio | Benchmark (Parameter)                                              |
|----------|----------------------|--------------------------------------|---------|--------------------------------------------------------------------|
| -        | 1.91±0.02s           | 1.55±0.01s                           |    0.81 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')  |
| -        | 1.77±0.01s           | 1.38±0.01s                           |    0.78 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr') |
| -        | 1.32±0.01s           | 995±6ms                              |    0.76 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')  |

Benchmarks that have stayed the same:

| Change   | Before [b23c5452]    | After [94d6b9c7] <dag-recreate-bt>   | Ratio   | Benchmark (Parameter)                                                         |
|----------|----------------------|--------------------------------------|---------|-------------------------------------------------------------------------------|
|          | 0                    | 0                                    | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('cx')                 |
|          | 0                    | 0                                    | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('cz')                 |
|          | 0                    | 0                                    | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('ecr')                |
|          | 107±1ms              | 110±0.3ms                            | 1.03    | utility_scale.UtilityScaleBenchmarks.time_bvlike('cx')                        |
|          | 3.67±0.01s           | 3.73±0.1s                            | 1.02    | utility_scale.UtilityScaleBenchmarks.time_circSU2('cx')                       |
|          | 11.3±0.02s           | 11.5±0.01s                           | 1.02    | utility_scale.UtilityScaleBenchmarks.time_qv('cz')                            |
|          | 108±1ms              | 110±0.6ms                            | 1.01    | utility_scale.UtilityScaleBenchmarks.time_bvlike('ecr')                       |
|          | 3.79±0.01s           | 3.83±0.06s                           | 1.01    | utility_scale.UtilityScaleBenchmarks.time_circSU2('cz')                       |
|          | 3.78±0.01s           | 3.83±0.08s                           | 1.01    | utility_scale.UtilityScaleBenchmarks.time_circSU2('ecr')                      |
|          | 6.72±0.05s           | 6.81±0s                              | 1.01    | utility_scale.UtilityScaleBenchmarks.time_qv('cx')                            |
|          | 8.54±0.07s           | 8.65±0.01s                           | 1.01    | utility_scale.UtilityScaleBenchmarks.time_qv('ecr')                           |
|          | 332±1ms              | 331±2ms                              | 1.00    | utility_scale.UtilityScaleBenchmarks.time_bv_100('cz')                        |
|          | 109±1ms              | 109±0.8ms                            | 1.00    | utility_scale.UtilityScaleBenchmarks.time_bvlike('cz')                        |
|          | 111±1ms              | 111±0.4ms                            | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')               |
|          | 19.8±0.01s           | 19.8±0.04s                           | 1.00    | utility_scale.UtilityScaleBenchmarks.time_qft('cx')                           |
|          | 395                  | 395                                  | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cx')                 |
|          | 397                  | 397                                  | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cz')                 |
|          | 397                  | 397                                  | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('ecr')                |
|          | 300                  | 300                                  | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cx')                |
|          | 300                  | 300                                  | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cz')                |
|          | 300                  | 300                                  | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('ecr')               |
|          | 1483                 | 1483                                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cx')                   |
|          | 1488                 | 1488                                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cz')                   |
|          | 1488                 | 1488                                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('ecr')                  |
|          | 1954                 | 1954                                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cx')                    |
|          | 1954                 | 1954                                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cz')                    |
|          | 1954                 | 1954                                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('ecr')                   |
|          | 2538                 | 2538                                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cx')                     |
|          | 2538                 | 2538                                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cz')                     |
|          | 2538                 | 2538                                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('ecr')                    |
|          | 435                  | 435                                  | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cx')      |
|          | 435                  | 435                                  | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cz')      |
|          | 435                  | 435                                  | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('ecr')     |
|          | 306±2ms              | 304±0.3ms                            | 0.99    | utility_scale.UtilityScaleBenchmarks.time_bv_100('cx')                        |
|          | 333±2ms              | 331±2ms                              | 0.99    | utility_scale.UtilityScaleBenchmarks.time_bv_100('ecr')                       |
|          | 22.4±0.1s            | 22.3±0.05s                           | 0.99    | utility_scale.UtilityScaleBenchmarks.time_qft('cz')                           |
|          | 22.4±0.1s            | 22.2±0s                              | 0.99    | utility_scale.UtilityScaleBenchmarks.time_qft('ecr')                          |
|          | 10.4±0.09ms          | 10.2±0.1ms                           | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')               |
|          | 112±0.8ms            | 110±0.6ms                            | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')                |
|          | 36.3±0.4ms           | 35.5±0.2ms                           | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')  |
|          | 36.2±0.3ms           | 35.5±0.5ms                           | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')  |
|          | 36.5±0.2ms           | 35.6±0.5ms                           | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr') |
|          | 2.84±0.01s           | 2.77±0.02s                           | 0.98    | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')                          |
|          | 10.5±0.04ms          | 10.2±0.05ms                          | 0.97    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')               |
|          | 10.5±0.05ms          | 10.3±0.05ms                          | 0.97    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')              |
|          | 112±0.4ms            | 109±1ms                              | 0.97    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')                |
|          | 2.33±0.02s           | 2.24±0.01s                           | 0.96    | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')                         |
|          | 1.14±0.01s           | 1.07±0.01s                           | 0.94    | utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')                          |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

mtreinish avatar Jul 29 '24 18:07 mtreinish