Rust implementation of the QPY module
Summary
Adds a rust-based implementation of qpy serialization.
Addresses #13131
Details and comments
We use rust structs along with the binrw library to specify and write the byte representation of data (this turns out to be somewhat simpler than how it's currently done in python). There are still many python-dependent elements we need to take into consideration:
- Backwards compatibility is currently not supported; whenever an older version is passed, the Python version is used instead. The goal of this PR is to establish a baseline from which it would be relatively easy to add support for previous versions.
- File header is still created in Python space; the entrypoint for the rust code is
write_circuitandread_circuit(this is being kept until a later PR adds backward compatability) - For every instruction, QPY stores the name of the corresponding python class, e.g. for X gate it stores the string
XGate. Currently all the class names are obtained by accessing the__name__field of the python class; for standard gates and instructions we'll need to switch to a python-independent solution (an ad-hoc dictionary is currently used for the reader component; there may be a better solution). - Since
Exprwas only recently moved to rust, after most of this PR was complete, we do not have a native Rust serialization yet; rather, we rely on the python serialization code (the_write_exprand_read_exprmethods). There are many more places where Rust might be used but Python is used instead. - numpy object serialization is done via python call to numpy's
save.
In this PR we make the distinction between packing an object and serializing it:
-
serializingan object means generating a bytes representation for it (aVec<u8>object, aliased toBytesin the module. -
packingan object means extracting its data fields and generating a corresponding rust struct, taken from theformats.rsfiles. This might involve serializing some of the object's fields, and packing other fields.
Our goal is to minimize serialization as much as possible, allowing us to generate a large packed struct reflecting the structure of the qpy file as much as possible before serializing; the code can be further improved to reflect this, although I think this can wait to a future PR.
One or more of the following people are relevant to this code:
-
@Qiskit/terra-core -
@mtreinish -
@nkanazawa1989
Hi @gadial, this is still in draft mode so I just gave it a brief look. I have a few general questions and comments at this stage:
- With the way the code is written now it's geared towards being called from Python, e.g. with the file header written in
interface.py, the Python calls inpy_write_circuit,pack_circuit_headerand more. Maybe it's just an artifact of this PR still being WIP, but with the C-API for circuit construction (this https://github.com/Qiskit/qiskit/pull/14006 and follow-ons) we should consider a C-api equivalent for thedumpworkflow or some aspects of it. - For the circuit header handling in
pack_circuit_headermaybeQuantumCircuitDatainconvertersor a similar approach might be useful here instead of all thegetattrcalls. - Generally speaking about 1 & 2: assuming we'll still need some Python calls, can we make the Rust-logic as Python-free as possible, say by having a layer to extract as much info as possible from the Python objects before passing it to the actual pack functions? I realize this might not be that straightforward, but still worth trying. Eventually we should have enough of the data model Python-free so it makes sense to design for that already.
- Is
use_rustindumpjust a WIP thing for debug or do you plan to keep Python'swrite_circuitas an option? If so why? Otherwise it would be good to have the formats defined only in one place. Currentlyformats.rsmirrorsformats.pywhich makes sense for the WIP status but the duplication is not good for the longer term. Just mentioning.
Thanks @eliarbel
- That's mostly at artifact of originally planning a smaller scope for this PR (in order to proceed incrementally instead of one huge PR) - I'm still trying to aim for the MVP, although this can be changed if needed since the header is a relatively easy part.
- Thanks, I'll try looking into it once the PR is passing.
- It's a little tricky since we deal with many different data types, each with its own Pythonic quirks, and some (e.g. parameters) are already transitioning to Rust. My priorities in this PR is to flush out the QPY structure as much as possible (i.e. by using detailed structs) and keeping the actual data extraction in specific
serialize_Xfunctions which can be amended on demand as we move more logic from Python to Rust. But I thing adding new extractors is out of scope for this PR which aims to establish the baseline. - That's most definitely a WIP thing - I use it to test both Rust and Python generate the same output. At the end I believe we should remove all the Python code related to QPY dumping, but we'll have to keep the formats until we handle reading as well.
Hi @gadial, this is still in draft mode so I just gave it a brief look. I have a few general questions and comments at this stage:
- With the way the code is written now it's geared towards being called from Python, e.g. with the file header written in
interface.py, the Python calls inpy_write_circuit,pack_circuit_headerand more. Maybe it's just an artifact of this PR still being WIP, but with the C-API for circuit construction (this Add initial C API for circuit construction #14006 and follow-ons) we should consider a C-api equivalent for thedumpworkflow or some aspects of it.- For the circuit header handling in
pack_circuit_headermaybeQuantumCircuitDatainconvertersor a similar approach might be useful here instead of all thegetattrcalls.- Generally speaking about 1 & 2: assuming we'll still need some Python calls, can we make the Rust-logic as Python-free as possible, say by having a layer to extract as much info as possible from the Python objects before passing it to the actual pack functions? I realize this might not be that straightforward, but still worth trying. Eventually we should have enough of the data model Python-free so it makes sense to design for that already.
- Is
use_rustindumpjust a WIP thing for debug or do you plan to keep Python'swrite_circuitas an option? If so why? Otherwise it would be good to have the formats defined only in one place. Currentlyformats.rsmirrorsformats.pywhich makes sense for the WIP status but the duplication is not good for the longer term. Just mentioning.
Pull Request Test Coverage Report for Build 20428701070
Warning: This coverage report may be inaccurate.
This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
- For more information on this, see Tracking coverage changes with pull request builds.
- To avoid this issue with future PRs, see these Recommended CI Configurations.
- For a quick fix, rebase this PR at GitHub. Your next report should be accurate.
Details
- 1282 of 3886 (32.99%) changed or added relevant lines in 26 files are covered.
- 1009 unchanged lines in 26 files lost coverage.
- Overall coverage decreased (-2.0%) to 86.349%
| Changes Missing Coverage | Covered Lines | Changed/Added Lines | % |
|---|---|---|---|
| crates/circuit/src/circuit_data.rs | 11 | 14 | 78.57% |
| crates/circuit/src/classical/expr/binary.rs | 0 | 3 | 0.0% |
| crates/circuit/src/classical/expr/unary.rs | 0 | 3 | 0.0% |
| crates/circuit/src/bit.rs | 9 | 16 | 56.25% |
| crates/circuit/src/operations.rs | 76 | 93 | 81.72% |
| crates/qpy/src/formats.rs | 68 | 85 | 80.0% |
| crates/circuit/src/parameter/parameter_expression.rs | 5 | 27 | 18.52% |
| crates/qpy/src/consts.rs | 25 | 55 | 45.45% |
| crates/qpy/src/annotations.rs | 29 | 106 | 27.36% |
| crates/qpy/src/bytes.rs | 54 | 183 | 29.51% |
| <!-- | Total: | 1282 | 3886 |
| Files with Coverage Reduction | New Missed Lines | % |
|---|---|---|
| crates/cext/src/transpiler/passes/unitary_synthesis.rs | 1 | 89.08% |
| qiskit/qasm2/export.py | 1 | 98.65% |
| crates/qasm2/src/lex.rs | 2 | 92.54% |
| crates/circuit/src/converters.rs | 3 | 96.1% |
| crates/circuit/src/parameter/parameter_expression.rs | 3 | 80.98% |
| crates/transpiler/src/transpiler.rs | 3 | 93.39% |
| crates/qasm2/src/parse.rs | 6 | 97.56% |
| crates/circuit/src/blocks.rs | 8 | 68.62% |
| qiskit/providers/basic_provider/basic_simulator.py | 10 | 96.77% |
| qiskit/qpy/binary_io/value.py | 10 | 72.82% |
| <!-- | Total: | 1009 |
| Totals | |
|---|---|
| Change from base Build 20130816757: | -2.0% |
| Covered Lines: | 97908 |
| Relevant Lines: | 113386 |
💛 - Coveralls
One or more of the following people are relevant to this code:
-
@Qiskit/terra-core -
@mtreinish -
@nkanazawa1989
Hey @gadial, thanks for doing all of this.
This isn't a complete review by any means, but I just wanted to give you some comments to work through ahead of your holiday. I'll continue reviewing this coming week.
The general questions I have so far are:
- Are there any changes to the current QPY format to support any of this, or is it compatible without introducing a new version?
- This PR is quite a bit larger than I was expecting. Are there any more meaningful chunks that you might be able to split this into to make reviewing it easier to understand architecturally? I think when we spoke about it before, you mentioned that testing was easier if both reader and writer were in place. But that leads me to the third point:
- Should we have Rust-level unit testing for any of this? Or, is there some amount of stress testing we could do, i.e. generating complex circuits with both the old and new serializers and making sure the outputs are equivalent? I have some doubt that our existing QPY test coverage is sufficient to detect potential differences introduced here (but perhaps I am wrong). How confident are you that the new code is sufficiently exercised by what we've got?
- No changes.
- There are three conceptual parts for the PR: First is the
formatsfile which defines the structure of the qpy file as much as possible (there are still someBytesfields e.g. for storing multiple possible values types; this may be further refined in a future PR). Next are thewriterand thereaderparts. Thewritertakes theQuantumCircuitobject and generates a struct; thereadertakes a struct and generates aQuantumCircuit. The reader and the writer obviously depend on the formats, but not on each other. However, not adding all three together would result in some dead code in the PR, which is not preferred. - In the long run we definitely should have Rust-level testing (which if I'm not mistaken is currently missing from most of our Rust code?). I did some stress testing during development (which indeed encountered some problems which were fixed) and will also do it as part of the benchmarking, but I have relatively good confidence in the current tests, due to the large amounts of problems they managed to detect in relatively niche situations.
Thanks @gadial!
Just want to note here that I will likely need to come back to this after getting control flow sorted ahead of the Sept. deadline in lieu of recent requirement changes there. Perhaps someone else can pick up where I'd left off here and I can have another supplemental look in a few weeks from now.
One or more of the following people are relevant to this code:
-
@Qiskit/terra-core -
@mtreinish -
@nkanazawa1989
Deferring to 2.4, as discussed offline. We should target merging it early in the 2.4 cycle though to avoid Gadi needing to update this more than necessary and to give enough testing time 🙂