qsharp
qsharp copied to clipboard
Test algorithm samples with and without debugging
This change introduces a scenario test under qsc that verifies the behavior of the algorithm samples with normal and debug execution. It uses fixed seeds to ensure results are predictable and restricts the slowest test (Shors.qs) to be release only.
Included is some cleanup of output from the samples so that compatible characters are used and no trailing spaces are present.
Benchmark for ddf9cd0
Click to view benchmark
| Test | Base | PR | % |
|---|---|---|---|
| Array append evaluation | 330.6±10.98µs | 327.7±2.43µs | -0.88% |
| Array literal evaluation | 173.9±0.75µs | 174.2±1.66µs | +0.17% |
| Array update evaluation | 410.9±9.32µs | 405.5±4.49µs | -1.31% |
| Core + Standard library compilation | 17.2±0.20ms | 17.2±0.36ms | 0.00% |
| Deutsch-Jozsa evaluation | 5.0±0.05ms | 5.0±0.16ms | 0.00% |
| Large file parity evaluation | 34.0±0.08ms | 33.9±0.10ms | -0.29% |
| Large input file compilation | 12.0±0.28ms | 12.0±0.18ms | 0.00% |
| Large input file compilation (interpreter) | 45.4±1.34ms | 45.1±1.37ms | -0.66% |
| Large nested iteration | 31.4±0.09ms | 32.2±0.13ms | +2.55% |
| Perform Runtime Capabilities Analysis (RCA) on Deutsch-Jozsa sample | 1556.6±40.68µs | 1562.2±44.04µs | +0.36% |
| Perform Runtime Capabilities Analysis (RCA) on large file sample | 7.8±0.07ms | 7.8±0.05ms | 0.00% |
| Perform Runtime Capabilities Analysis (RCA) on teleport sample | 1414.3±27.02µs | 1422.3±35.32µs | +0.57% |
| Perform Runtime Capabilities Analysis (RCA) on the core and std libraries | 27.0±0.34ms | 27.3±1.27ms | +1.11% |
| Teleport evaluation | 87.0±3.39µs | 88.4±6.81µs | +1.61% |
Benchmark for 1eb7d76
Click to view benchmark
| Test | Base | PR | % |
|---|---|---|---|
| Array append evaluation | 330.2±1.94µs | 331.1±4.98µs | +0.27% |
| Array literal evaluation | 190.2±2.46µs | 189.5±0.74µs | -0.37% |
| Array update evaluation | 408.6±2.62µs | 408.5±2.88µs | -0.02% |
| Core + Standard library compilation | 20.4±0.92ms | 21.4±0.79ms | +4.90% |
| Deutsch-Jozsa evaluation | 5.0±0.05ms | 5.0±0.15ms | 0.00% |
| Large file parity evaluation | 34.0±0.13ms | 34.0±0.15ms | 0.00% |
| Large input file compilation | 13.1±0.54ms | 13.9±0.49ms | +6.11% |
| Large input file compilation (interpreter) | 49.9±1.95ms | 51.5±1.39ms | +3.21% |
| Large nested iteration | 32.7±0.18ms | 32.6±0.17ms | -0.31% |
| Perform Runtime Capabilities Analysis (RCA) on Deutsch-Jozsa sample | 1584.3±79.08µs | 1617.9±162.14µs | +2.12% |
| Perform Runtime Capabilities Analysis (RCA) on large file sample | 8.1±0.15ms | 8.0±0.13ms | -1.23% |
| Perform Runtime Capabilities Analysis (RCA) on teleport sample | 1453.3±88.76µs | 1466.9±117.63µs | +0.94% |
| Perform Runtime Capabilities Analysis (RCA) on the core and std libraries | 28.2±0.31ms | 28.2±0.24ms | 0.00% |
| Teleport evaluation | 87.2±3.92µs | 88.4±5.30µs | +1.38% |
Benchmark for 5ddfcb0
Click to view benchmark
| Test | Base | PR | % |
|---|---|---|---|
| Array append evaluation | 326.9±2.31µs | 337.5±15.57µs | +3.24% |
| Array literal evaluation | 177.4±1.52µs | 194.6±4.09µs | +9.70% |
| Array update evaluation | 409.8±3.69µs | 417.9±5.18µs | +1.98% |
| Core + Standard library compilation | 21.1±1.25ms | 21.4±1.24ms | +1.42% |
| Deutsch-Jozsa evaluation | 5.1±0.04ms | 5.1±0.31ms | 0.00% |
| Large file parity evaluation | 34.1±0.60ms | 34.1±0.48ms | 0.00% |
| Large input file compilation | 14.0±0.57ms | 13.9±0.60ms | -0.71% |
| Large input file compilation (interpreter) | 51.8±1.65ms | 50.8±1.58ms | -1.93% |
| Large nested iteration | 32.6±1.36ms | 33.1±0.18ms | +1.53% |
| Perform Runtime Capabilities Analysis (RCA) on Deutsch-Jozsa sample | 1597.4±91.15µs | 1605.7±135.00µs | +0.52% |
| Perform Runtime Capabilities Analysis (RCA) on large file sample | 8.2±0.19ms | 8.0±0.15ms | -2.44% |
| Perform Runtime Capabilities Analysis (RCA) on teleport sample | 1459.9±111.91µs | 1465.3±140.93µs | +0.37% |
| Perform Runtime Capabilities Analysis (RCA) on the core and std libraries | 28.5±0.32ms | 28.5±0.43ms | 0.00% |
| Teleport evaluation | 88.6±4.44µs | 87.8±3.72µs | -0.90% |
Benchmark for 9504d80
Click to view benchmark
| Test | Base | PR | % |
|---|---|---|---|
| Array append evaluation | 338.1±13.86µs | 337.3±6.25µs | -0.24% |
| Array literal evaluation | 186.7±1.16µs | 174.2±0.87µs | -6.70% |
| Array update evaluation | 419.0±16.97µs | 417.1±10.75µs | -0.45% |
| Core + Standard library compilation | 18.7±0.20ms | 18.7±0.30ms | 0.00% |
| Deutsch-Jozsa evaluation | 5.1±0.04ms | 5.1±0.06ms | 0.00% |
| Large file parity evaluation | 34.1±0.13ms | 34.1±0.30ms | 0.00% |
| Large input file compilation | 12.5±0.18ms | 12.5±0.30ms | 0.00% |
| Large input file compilation (interpreter) | 46.5±0.82ms | 46.3±0.68ms | -0.43% |
| Large nested iteration | 33.5±0.60ms | 33.3±1.45ms | -0.60% |
| Perform Runtime Capabilities Analysis (RCA) on Deutsch-Jozsa sample | 1568.5±29.33µs | 1563.2±40.29µs | -0.34% |
| Perform Runtime Capabilities Analysis (RCA) on large file sample | 7.8±0.10ms | 7.7±0.10ms | -1.28% |
| Perform Runtime Capabilities Analysis (RCA) on teleport sample | 1427.3±30.14µs | 1423.6±30.24µs | -0.26% |
| Perform Runtime Capabilities Analysis (RCA) on the core and std libraries | 28.1±0.17ms | 27.8±0.18ms | -1.07% |
| Teleport evaluation | 89.8±5.73µs | 89.5±8.07µs | -0.33% |
Benchmark for 6dffb80
Click to view benchmark
| Test | Base | PR | % |
|---|---|---|---|
| Array append evaluation | 338.6±10.08µs | 338.2±7.77µs | -0.12% |
| Array literal evaluation | 172.9±2.00µs | 183.8±0.75µs | +6.30% |
| Array update evaluation | 416.0±10.48µs | 416.2±11.49µs | +0.05% |
| Core + Standard library compilation | 18.9±0.44ms | 18.9±0.29ms | 0.00% |
| Deutsch-Jozsa evaluation | 5.0±0.06ms | 5.0±0.09ms | 0.00% |
| Large file parity evaluation | 34.1±0.12ms | 34.1±0.07ms | 0.00% |
| Large input file compilation | 12.4±0.13ms | 12.8±0.57ms | +3.23% |
| Large input file compilation (interpreter) | 46.9±1.01ms | 46.7±1.03ms | -0.43% |
| Large nested iteration | 32.4±0.29ms | 32.9±0.51ms | +1.54% |
| Perform Runtime Capabilities Analysis (RCA) on Deutsch-Jozsa sample | 1572.6±37.71µs | 1572.1±36.08µs | -0.03% |
| Perform Runtime Capabilities Analysis (RCA) on large file sample | 7.9±0.10ms | 7.8±0.08ms | -1.27% |
| Perform Runtime Capabilities Analysis (RCA) on teleport sample | 1433.7±32.25µs | 1424.6±29.71µs | -0.63% |
| Perform Runtime Capabilities Analysis (RCA) on the core and std libraries | 28.3±0.25ms | 28.6±0.28ms | +1.06% |
| Teleport evaluation | 87.9±4.51µs | 88.2±5.91µs | +0.34% |
I don't understand how there can be such a dramatic difference in time taken to run all samples twice vs the existing "build all samples" logic.
running all samples twice: 1.88s
existing "Building qsharp samples" step: 109.929s
If there's some black magic that makes the first one run so fast, maybe we can remove the existing step?!?
I don't understand how there can be such a dramatic difference in time taken to run all samples twice vs the existing "build all samples" logic. ... If there's some black magic that makes the first one run so fast, maybe we can remove the existing step?!?
Ha, yeah, it's quite dramatic, isn't it? I'll go into more detail below for posterity, but the short answer is that it does things in memory, in parallel, rather than sequentially with disk reads. That said, I think the other test is still worthwhile (and maybe can be moved to integration tests?), in part because it builds all samples including language samples rather than just the algorithm samples. As soon as a new qs file or qsharp.json project is added to the samples folders, it will get picked up automatically by this part of the build. If after this PR (and ongoing work to add verification output to the language samples) we want to pull those into a test pattern like this, then we can revisit the question of whether the older style is still needed. Longer explanation below...
When compiling these as unit tests, the test files are brought in using the include_str! macro, which tells Rust to include the contents of that file at compile time so that the running test can read the contents from the data section of the process memory rather than needing to load it from file. Rust also handily keeps track of any files pulled in this way, so that when the file is changed on disk it triggers the relevant rebuilds even if there are no other changes to the code. And during runtime, the tests are run in parallel, so can take advantage of as many cores as are available on the machine. Compare this to how the tests run using qsc in the build script: the folders are searched on disk, each file path is passed as an argument to a separate process invocation, which needs to read the file from disk in order to compile it. All that sequential process start and disk access overwhelms the actual runtime of the compiler. The tradeoff is needing to add each sample we want to test individually as test case rather than have it be auto-detected from disk. In my opinion, the scale of the perf difference makes that additional maintenance burden worthwhile.
Can we maybe add a build script to iterate through the samples and include all of them in the Rust tests? The fact that each sample is hardcoded sadly takes away from the usefulness of these tests... I can guarantee that the next person to add a sample will not think of adding them to this list :/
Can we maybe add a build script to iterate through the samples and include all of them in the Rust tests? The fact that each sample is hardcoded sadly takes away from the usefulness of these tests... I can guarantee that the next person to add a sample will not think of adding them to this list :/
Not while still using expect![] macros to verify the output inline, no. Each one really does need to be individually added. If we switched to a mode where the output we compare to was stored in files, then maybe... but it would definitely get more complicated. The samples testing is kind of a twist on the old "iron triangle" adage where in this case the trade-off is test speed vs test simplicity vs test automation, and we only get to pick two.
We could have one big giant expect![] that verifies the result of dynamically concatenated program outputs, right?
It's not obviously clear to me which way's better at the moment, you're right, each approach has its drawbacks - I'll let it sit in the back of my mind for a bit.
I have a couple crazy ideas on how we could add a bit more automation to this, but I need to experiment with them more before I know how bad/good the ideas are...
Benchmark for d76d1a6
Click to view benchmark
| Test | Base | PR | % |
|---|---|---|---|
| Array append evaluation | 329.4±10.99µs | 327.4±2.15µs | -0.61% |
| Array literal evaluation | 192.5±1.37µs | 174.8±1.76µs | -9.19% |
| Array update evaluation | 403.2±1.30µs | 408.1±15.83µs | +1.22% |
| Core + Standard library compilation | 23.9±1.34ms | 23.1±1.03ms | -3.35% |
| Deutsch-Jozsa evaluation | 5.1±0.06ms | 5.2±0.05ms | +1.96% |
| Large file parity evaluation | 34.3±0.18ms | 34.3±0.31ms | 0.00% |
| Large input file compilation | 14.2±0.52ms | 14.0±0.46ms | -1.41% |
| Large input file compilation (interpreter) | 53.3±1.47ms | 53.3±1.69ms | 0.00% |
| Large nested iteration | 32.2±0.27ms | 32.4±0.27ms | +0.62% |
| Perform Runtime Capabilities Analysis (RCA) on Deutsch-Jozsa sample | 1612.9±132.56µs | 1620.2±157.41µs | +0.45% |
| Perform Runtime Capabilities Analysis (RCA) on large file sample | 8.3±0.18ms | 8.2±0.15ms | -1.20% |
| Perform Runtime Capabilities Analysis (RCA) on teleport sample | 1463.9±117.49µs | 1474.3±114.19µs | +0.71% |
| Perform Runtime Capabilities Analysis (RCA) on the core and std libraries | 28.7±0.34ms | 28.9±0.49ms | +0.70% |
| Teleport evaluation | 90.9±3.61µs | 89.3±3.94µs | -1.76% |
Benchmark for 49dcbd1
Click to view benchmark
| Test | Base | PR | % |
|---|---|---|---|
| Array append evaluation | 327.5±2.20µs | 329.4±6.56µs | +0.58% |
| Array literal evaluation | 174.6±1.24µs | 180.9±5.15µs | +3.61% |
| Array update evaluation | 406.4±1.88µs | 408.0±2.96µs | +0.39% |
| Core + Standard library compilation | 21.3±0.96ms | 20.3±0.37ms | -4.69% |
| Deutsch-Jozsa evaluation | 5.1±0.05ms | 5.1±0.06ms | 0.00% |
| Large file parity evaluation | 34.2±0.08ms | 34.2±0.16ms | 0.00% |
| Large input file compilation | 13.3±0.31ms | 12.7±0.40ms | -4.51% |
| Large input file compilation (interpreter) | 50.0±1.48ms | 49.8±1.37ms | -0.40% |
| Large nested iteration | 32.9±0.94ms | 32.7±0.23ms | -0.61% |
| Perform Runtime Capabilities Analysis (RCA) on Deutsch-Jozsa sample | 1623.3±138.44µs | 1583.6±40.83µs | -2.45% |
| Perform Runtime Capabilities Analysis (RCA) on large file sample | 7.8±0.17ms | 7.8±0.09ms | 0.00% |
| Perform Runtime Capabilities Analysis (RCA) on teleport sample | 1448.6±74.39µs | 1444.1±30.77µs | -0.31% |
| Perform Runtime Capabilities Analysis (RCA) on the core and std libraries | 28.1±0.26ms | 28.1±0.50ms | 0.00% |
| Teleport evaluation | 89.8±3.43µs | 89.6±3.12µs | -0.22% |
We'll also need a user-facing folder samples/testing when we get to adding samples of how to test Q# code. So, without looking at the code, I'm in favor of keeping Rust test code that does not target users outside of the samples folder so as not to confuse these two.
On Wed, Jun 19, 2024 at 7:43 AM Stefan J. Wernli @.***> wrote:
@.**** commented on this pull request.
On samples_test/Cargo.toml https://github.com/microsoft/qsharp/pull/1533#discussion_r1646331297:
I debated on that one. My thinking was that right now the samples folder is pretty cleanly for folks looking to find Q# samples. We link to it from other places, like QCOM. If there were Rust tests mixed in, especially slightly arcane ones like these that are generated from a build script, it makes the folder confusing for newcomers.
On a related note, I also debated about keeping these as a test under "compiler/qsc" where I had originally put them. But it occurred to me that with the way the build script is set up to watch the samples folder, any change to samples would cause qsc to be rebuilt, which is not ideal. This way there is a clearer division: samples_test depends on samples and qsc, so changes to either of those will rebuild samples_test, but qsc won't depend on the samples folder.
— Reply to this email directly, view it on GitHub https://github.com/microsoft/qsharp/pull/1533#discussion_r1646331297, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNFAADBN4W5KV3T7NRWNLTZIGKKXAVCNFSM6AAAAABHZPYGA6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCMRYGQ4DSNJVGA . You are receiving this because your review was requested.Message ID: @.***>
Benchmark for a40e6d2
Click to view benchmark
| Test | Base | PR | % |
|---|---|---|---|
| Array append evaluation | 327.5±1.39µs | 330.8±2.52µs | +1.01% |
| Array literal evaluation | 189.5±2.83µs | 198.7±0.87µs | +4.85% |
| Array update evaluation | 413.0±1.83µs | 415.5±1.90µs | +0.61% |
| Core + Standard library compilation | 22.8±0.62ms | 23.1±0.63ms | +1.32% |
| Deutsch-Jozsa evaluation | 5.1±0.07ms | 5.1±0.04ms | 0.00% |
| Large file parity evaluation | 34.4±0.18ms | 34.6±0.98ms | +0.58% |
| Large input file compilation | 13.6±0.34ms | 14.1±0.48ms | +3.68% |
| Large input file compilation (interpreter) | 55.2±1.96ms | 54.4±2.02ms | -1.45% |
| Large nested iteration | 32.1±0.28ms | 32.4±0.39ms | +0.93% |
| Perform Runtime Capabilities Analysis (RCA) on Deutsch-Jozsa sample | 1612.4±126.24µs | 1611.5±143.40µs | -0.06% |
| Perform Runtime Capabilities Analysis (RCA) on large file sample | 8.0±0.14ms | 8.1±0.14ms | +1.25% |
| Perform Runtime Capabilities Analysis (RCA) on teleport sample | 1480.3±172.54µs | 1477.8±178.45µs | -0.17% |
| Perform Runtime Capabilities Analysis (RCA) on the core and std libraries | 28.7±0.27ms | 28.8±0.28ms | +0.35% |
| Teleport evaluation | 89.4±3.72µs | 91.0±3.55µs | +1.79% |
Benchmark for f46ca1c
Click to view benchmark
| Test | Base | PR | % |
|---|---|---|---|
| Array append evaluation | 333.4±18.58µs | 329.0±2.23µs | -1.32% |
| Array literal evaluation | 189.1±2.25µs | 189.4±0.89µs | +0.16% |
| Array update evaluation | 413.1±7.34µs | 410.7±1.58µs | -0.58% |
| Core + Standard library compilation | 21.5±1.03ms | 21.2±1.42ms | -1.40% |
| Deutsch-Jozsa evaluation | 5.1±0.04ms | 5.1±0.05ms | 0.00% |
| Large file parity evaluation | 34.3±0.11ms | 34.4±0.64ms | +0.29% |
| Large input file compilation | 12.9±0.82ms | 12.9±0.63ms | 0.00% |
| Large input file compilation (interpreter) | 51.1±2.19ms | 50.7±2.74ms | -0.78% |
| Large nested iteration | 32.0±0.14ms | 32.0±0.41ms | 0.00% |
| Perform Runtime Capabilities Analysis (RCA) on Deutsch-Jozsa sample | 1574.0±86.12µs | 1601.2±224.70µs | +1.73% |
| Perform Runtime Capabilities Analysis (RCA) on large file sample | 8.0±0.30ms | 8.2±0.36ms | +2.50% |
| Perform Runtime Capabilities Analysis (RCA) on teleport sample | 1434.1±84.56µs | 1456.7±68.23µs | +1.58% |
| Perform Runtime Capabilities Analysis (RCA) on the core and std libraries | 28.5±0.75ms | 28.6±0.40ms | +0.35% |
| Teleport evaluation | 90.2±3.72µs | 90.0±3.42µs | -0.22% |
Benchmark for 7e1ffa5
Click to view benchmark
| Test | Base | PR | % |
|---|---|---|---|
| Array append evaluation | 333.7±4.75µs | 331.6±2.14µs | -0.63% |
| Array literal evaluation | 189.0±0.54µs | 190.1±4.44µs | +0.58% |
| Array update evaluation | 412.8±1.40µs | 412.3±1.04µs | -0.12% |
| Core + Standard library compilation | 22.7±0.84ms | 23.4±1.31ms | +3.08% |
| Deutsch-Jozsa evaluation | 5.1±0.05ms | 5.1±0.06ms | 0.00% |
| Large file parity evaluation | 34.4±0.08ms | 34.3±0.08ms | -0.29% |
| Large input file compilation | 14.3±0.70ms | 14.2±0.62ms | -0.70% |
| Large input file compilation (interpreter) | 53.0±1.56ms | 55.5±1.98ms | +4.72% |
| Large nested iteration | 32.4±0.93ms | 32.2±0.31ms | -0.62% |
| Perform Runtime Capabilities Analysis (RCA) on Deutsch-Jozsa sample | 1622.7±152.86µs | 1626.1±151.32µs | +0.21% |
| Perform Runtime Capabilities Analysis (RCA) on large file sample | 8.2±0.13ms | 8.2±0.14ms | 0.00% |
| Perform Runtime Capabilities Analysis (RCA) on teleport sample | 1481.0±143.77µs | 1473.7±106.74µs | -0.49% |
| Perform Runtime Capabilities Analysis (RCA) on the core and std libraries | 28.9±0.98ms | 29.1±0.89ms | +0.69% |
| Teleport evaluation | 90.4±3.58µs | 92.3±8.87µs | +2.10% |
Great! Now maybe we can add the language samples too and get rid of the step in Python? ;)
Finally put the ideas this question triggered into a PR: https://github.com/microsoft/qsharp/pull/1739. Is the shift away from Python, qsc-based tests worth the build.rs complexity? We can let the team discuss. But at least we know it's possible!