pennylane MCM - tree-traversal implementation of native MCM execution

Before submitting

Please complete the following checklist when submitting a PR:

[x] All new features must include a unit test. If you've fixed a bug or added code that should be tested, add a test to the test directory!
[x] All new functions and code must be clearly commented and documented. If you do make documentation changes, make sure that the docs build and render correctly by running make docs.
[x] Ensure that the test suite passes, by running make test.
[x] Add a new entry to the doc/releases/changelog-dev.md file, summarizing the change, and including a link back to the PR.
[x] The PennyLane source code conforms to PEP8 standards. We check all of our code against Pylint. To lint modified files, simply pip install pylint, and then run pylint pennylane/path/to/file.py.

When all the above are checked, delete everything above the dashed line and fill in the pull request template.

Context: Native MCM execution is slow because executing n_shots tapes is generally redundant and has a lot of overheads.

Description of the Change: Introduce simulate_tree_mcm and replace simulate_native_mcm by it. simulate_tree_mcm implements a high-memory depth-first tree-traversal algorithm. It is deemed high-memory because a copy of the statevector is made at every node in the tree. Since this is a depth first traversal, it incurs a memory cost about (n_mcm + 1) 2 ** n_qubit to store the statevectors at any moment.

Benefits: Much faster execution in almost any case. Opens avenues for improvement and other features, for example low-memory depth-first tree-traversal, high-prob-first traversal, quantum noise simulations.

Here are some benchmarks to illustrate the gains. The following synthetic workloads shows that for small circuits with not too many MCM, deferred measurements is best. The tree-traversal approach is slower than deferred measurements, but much faster than the one-shot implementation. synthetic_time_vs_shots

A more meaningful example is to run iterative QPE for 10 iterations with a varying number of shots. The one-shot implementation is again sluggish. The tree-traversal implementation does better, but appears to scale worst then deferred measurements, again the fastest.
iterqpe_time_vs_shots

The picture changes when running iterative QPE with 1e6 samples for varying iterations. We do not perform one-shot benchmarks since it is too slow. The tree-traversal implementation is indeed much slower in the 10-20 iteration range, but starts winning over deferred measurements beyond that. It thus appears to have a larger prefactor which is eventually compensated by slightly better scaling.
iterqpe_time_vs_iters

Finally, we perform few-shots calculations to illustrate regimes where one-shot could be useful. There is indeed an observable cross-over between one-shot and deferred-measurements. The tree traversal implementation however is usually faster even with few shots because it then has a limited number of branches to explore before running out of shots. iterqpe_time_vs_iters_all

Possible Drawbacks: More complex implementation is harder to debug and maintain.

Related GitHub Issues: Mid circuit Measurements tree traversal implementation [sc-56035]

Feb 08 '24 14:02 vincentmr

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.67%. Comparing base (d616c3e) to head (65e49e6).

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5180      +/-   ##
==========================================
- Coverage   99.67%   99.67%   -0.01%     
==========================================
  Files         421      421              
  Lines       40243    40115     -128     
==========================================
- Hits        40114    39985     -129     
- Misses        129      130       +1

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Feb 08 '24 14:02 codecov-commenter

Thanks @vincentmr . Looks good so far. There are a few edge cases that need to be addressed. I'm mostly happy with the docstrings for the helper functions, but some dev comments throughout the code could be helpful too as the code can get a bit hard to follow.

Cheers! I simplified the logic a bit further and tried to improve the docstrings.

We need to add this to the measurements.rst document undering the "Configuring mid-circuit measurments" section.

I was going to do this in a follow-up PR. Is this blocking?

Also, I assume this is not expected to be jit compatible in its current state?

Correct. The difficulty is sub-tree circuits have a dynamic shots argument and sample measurements which is incompatible with jitting at the moment.

Jun 12 '24 14:06 vincentmr

Thanks @mudit2812 , @dime10 and @albi3ro for your careful read and all your suggestions which brought the PR in its current state.

Jun 17 '24 14:06 vincentmr

pennylane pennylane copied to clipboard

MCM - tree-traversal implementation of native MCM execution

Before submitting

Codecov Report

pennylane
pennylane copied to clipboard