qsharp icon indicating copy to clipboard operation
qsharp copied to clipboard

Test algorithm samples with and without debugging

Open swernli opened this issue 1 year ago • 3 comments
trafficstars

This change introduces a scenario test under qsc that verifies the behavior of the algorithm samples with normal and debug execution. It uses fixed seeds to ensure results are predictable and restricts the slowest test (Shors.qs) to be release only.

Included is some cleanup of output from the samples so that compatible characters are used and no trailing spaces are present.

swernli avatar May 16 '24 05:05 swernli

Benchmark for ddf9cd0

Click to view benchmark
Test Base PR %
Array append evaluation 330.6±10.98µs 327.7±2.43µs -0.88%
Array literal evaluation 173.9±0.75µs 174.2±1.66µs +0.17%
Array update evaluation 410.9±9.32µs 405.5±4.49µs -1.31%
Core + Standard library compilation 17.2±0.20ms 17.2±0.36ms 0.00%
Deutsch-Jozsa evaluation 5.0±0.05ms 5.0±0.16ms 0.00%
Large file parity evaluation 34.0±0.08ms 33.9±0.10ms -0.29%
Large input file compilation 12.0±0.28ms 12.0±0.18ms 0.00%
Large input file compilation (interpreter) 45.4±1.34ms 45.1±1.37ms -0.66%
Large nested iteration 31.4±0.09ms 32.2±0.13ms +2.55%
Perform Runtime Capabilities Analysis (RCA) on Deutsch-Jozsa sample 1556.6±40.68µs 1562.2±44.04µs +0.36%
Perform Runtime Capabilities Analysis (RCA) on large file sample 7.8±0.07ms 7.8±0.05ms 0.00%
Perform Runtime Capabilities Analysis (RCA) on teleport sample 1414.3±27.02µs 1422.3±35.32µs +0.57%
Perform Runtime Capabilities Analysis (RCA) on the core and std libraries 27.0±0.34ms 27.3±1.27ms +1.11%
Teleport evaluation 87.0±3.39µs 88.4±6.81µs +1.61%

github-actions[bot] avatar May 16 '24 06:05 github-actions[bot]

Benchmark for 1eb7d76

Click to view benchmark
Test Base PR %
Array append evaluation 330.2±1.94µs 331.1±4.98µs +0.27%
Array literal evaluation 190.2±2.46µs 189.5±0.74µs -0.37%
Array update evaluation 408.6±2.62µs 408.5±2.88µs -0.02%
Core + Standard library compilation 20.4±0.92ms 21.4±0.79ms +4.90%
Deutsch-Jozsa evaluation 5.0±0.05ms 5.0±0.15ms 0.00%
Large file parity evaluation 34.0±0.13ms 34.0±0.15ms 0.00%
Large input file compilation 13.1±0.54ms 13.9±0.49ms +6.11%
Large input file compilation (interpreter) 49.9±1.95ms 51.5±1.39ms +3.21%
Large nested iteration 32.7±0.18ms 32.6±0.17ms -0.31%
Perform Runtime Capabilities Analysis (RCA) on Deutsch-Jozsa sample 1584.3±79.08µs 1617.9±162.14µs +2.12%
Perform Runtime Capabilities Analysis (RCA) on large file sample 8.1±0.15ms 8.0±0.13ms -1.23%
Perform Runtime Capabilities Analysis (RCA) on teleport sample 1453.3±88.76µs 1466.9±117.63µs +0.94%
Perform Runtime Capabilities Analysis (RCA) on the core and std libraries 28.2±0.31ms 28.2±0.24ms 0.00%
Teleport evaluation 87.2±3.92µs 88.4±5.30µs +1.38%

github-actions[bot] avatar May 20 '24 21:05 github-actions[bot]

Benchmark for 5ddfcb0

Click to view benchmark
Test Base PR %
Array append evaluation 326.9±2.31µs 337.5±15.57µs +3.24%
Array literal evaluation 177.4±1.52µs 194.6±4.09µs +9.70%
Array update evaluation 409.8±3.69µs 417.9±5.18µs +1.98%
Core + Standard library compilation 21.1±1.25ms 21.4±1.24ms +1.42%
Deutsch-Jozsa evaluation 5.1±0.04ms 5.1±0.31ms 0.00%
Large file parity evaluation 34.1±0.60ms 34.1±0.48ms 0.00%
Large input file compilation 14.0±0.57ms 13.9±0.60ms -0.71%
Large input file compilation (interpreter) 51.8±1.65ms 50.8±1.58ms -1.93%
Large nested iteration 32.6±1.36ms 33.1±0.18ms +1.53%
Perform Runtime Capabilities Analysis (RCA) on Deutsch-Jozsa sample 1597.4±91.15µs 1605.7±135.00µs +0.52%
Perform Runtime Capabilities Analysis (RCA) on large file sample 8.2±0.19ms 8.0±0.15ms -2.44%
Perform Runtime Capabilities Analysis (RCA) on teleport sample 1459.9±111.91µs 1465.3±140.93µs +0.37%
Perform Runtime Capabilities Analysis (RCA) on the core and std libraries 28.5±0.32ms 28.5±0.43ms 0.00%
Teleport evaluation 88.6±4.44µs 87.8±3.72µs -0.90%

github-actions[bot] avatar May 22 '24 23:05 github-actions[bot]

Benchmark for 9504d80

Click to view benchmark
Test Base PR %
Array append evaluation 338.1±13.86µs 337.3±6.25µs -0.24%
Array literal evaluation 186.7±1.16µs 174.2±0.87µs -6.70%
Array update evaluation 419.0±16.97µs 417.1±10.75µs -0.45%
Core + Standard library compilation 18.7±0.20ms 18.7±0.30ms 0.00%
Deutsch-Jozsa evaluation 5.1±0.04ms 5.1±0.06ms 0.00%
Large file parity evaluation 34.1±0.13ms 34.1±0.30ms 0.00%
Large input file compilation 12.5±0.18ms 12.5±0.30ms 0.00%
Large input file compilation (interpreter) 46.5±0.82ms 46.3±0.68ms -0.43%
Large nested iteration 33.5±0.60ms 33.3±1.45ms -0.60%
Perform Runtime Capabilities Analysis (RCA) on Deutsch-Jozsa sample 1568.5±29.33µs 1563.2±40.29µs -0.34%
Perform Runtime Capabilities Analysis (RCA) on large file sample 7.8±0.10ms 7.7±0.10ms -1.28%
Perform Runtime Capabilities Analysis (RCA) on teleport sample 1427.3±30.14µs 1423.6±30.24µs -0.26%
Perform Runtime Capabilities Analysis (RCA) on the core and std libraries 28.1±0.17ms 27.8±0.18ms -1.07%
Teleport evaluation 89.8±5.73µs 89.5±8.07µs -0.33%

github-actions[bot] avatar May 31 '24 21:05 github-actions[bot]

Benchmark for 6dffb80

Click to view benchmark
Test Base PR %
Array append evaluation 338.6±10.08µs 338.2±7.77µs -0.12%
Array literal evaluation 172.9±2.00µs 183.8±0.75µs +6.30%
Array update evaluation 416.0±10.48µs 416.2±11.49µs +0.05%
Core + Standard library compilation 18.9±0.44ms 18.9±0.29ms 0.00%
Deutsch-Jozsa evaluation 5.0±0.06ms 5.0±0.09ms 0.00%
Large file parity evaluation 34.1±0.12ms 34.1±0.07ms 0.00%
Large input file compilation 12.4±0.13ms 12.8±0.57ms +3.23%
Large input file compilation (interpreter) 46.9±1.01ms 46.7±1.03ms -0.43%
Large nested iteration 32.4±0.29ms 32.9±0.51ms +1.54%
Perform Runtime Capabilities Analysis (RCA) on Deutsch-Jozsa sample 1572.6±37.71µs 1572.1±36.08µs -0.03%
Perform Runtime Capabilities Analysis (RCA) on large file sample 7.9±0.10ms 7.8±0.08ms -1.27%
Perform Runtime Capabilities Analysis (RCA) on teleport sample 1433.7±32.25µs 1424.6±29.71µs -0.63%
Perform Runtime Capabilities Analysis (RCA) on the core and std libraries 28.3±0.25ms 28.6±0.28ms +1.06%
Teleport evaluation 87.9±4.51µs 88.2±5.91µs +0.34%

github-actions[bot] avatar Jun 05 '24 05:06 github-actions[bot]

I don't understand how there can be such a dramatic difference in time taken to run all samples twice vs the existing "build all samples" logic.

running all samples twice: 1.88s image

existing "Building qsharp samples" step: 109.929s image

If there's some black magic that makes the first one run so fast, maybe we can remove the existing step?!?

minestarks avatar Jun 07 '24 19:06 minestarks

I don't understand how there can be such a dramatic difference in time taken to run all samples twice vs the existing "build all samples" logic. ... If there's some black magic that makes the first one run so fast, maybe we can remove the existing step?!?

Ha, yeah, it's quite dramatic, isn't it? I'll go into more detail below for posterity, but the short answer is that it does things in memory, in parallel, rather than sequentially with disk reads. That said, I think the other test is still worthwhile (and maybe can be moved to integration tests?), in part because it builds all samples including language samples rather than just the algorithm samples. As soon as a new qs file or qsharp.json project is added to the samples folders, it will get picked up automatically by this part of the build. If after this PR (and ongoing work to add verification output to the language samples) we want to pull those into a test pattern like this, then we can revisit the question of whether the older style is still needed. Longer explanation below...

When compiling these as unit tests, the test files are brought in using the include_str! macro, which tells Rust to include the contents of that file at compile time so that the running test can read the contents from the data section of the process memory rather than needing to load it from file. Rust also handily keeps track of any files pulled in this way, so that when the file is changed on disk it triggers the relevant rebuilds even if there are no other changes to the code. And during runtime, the tests are run in parallel, so can take advantage of as many cores as are available on the machine. Compare this to how the tests run using qsc in the build script: the folders are searched on disk, each file path is passed as an argument to a separate process invocation, which needs to read the file from disk in order to compile it. All that sequential process start and disk access overwhelms the actual runtime of the compiler. The tradeoff is needing to add each sample we want to test individually as test case rather than have it be auto-detected from disk. In my opinion, the scale of the perf difference makes that additional maintenance burden worthwhile.

swernli avatar Jun 11 '24 04:06 swernli

Can we maybe add a build script to iterate through the samples and include all of them in the Rust tests? The fact that each sample is hardcoded sadly takes away from the usefulness of these tests... I can guarantee that the next person to add a sample will not think of adding them to this list :/

minestarks avatar Jun 13 '24 17:06 minestarks

Can we maybe add a build script to iterate through the samples and include all of them in the Rust tests? The fact that each sample is hardcoded sadly takes away from the usefulness of these tests... I can guarantee that the next person to add a sample will not think of adding them to this list :/

Not while still using expect![] macros to verify the output inline, no. Each one really does need to be individually added. If we switched to a mode where the output we compare to was stored in files, then maybe... but it would definitely get more complicated. The samples testing is kind of a twist on the old "iron triangle" adage where in this case the trade-off is test speed vs test simplicity vs test automation, and we only get to pick two.

swernli avatar Jun 13 '24 18:06 swernli

We could have one big giant expect![] that verifies the result of dynamically concatenated program outputs, right?

It's not obviously clear to me which way's better at the moment, you're right, each approach has its drawbacks - I'll let it sit in the back of my mind for a bit.

minestarks avatar Jun 14 '24 18:06 minestarks

I have a couple crazy ideas on how we could add a bit more automation to this, but I need to experiment with them more before I know how bad/good the ideas are...

swernli avatar Jun 19 '24 06:06 swernli

Benchmark for d76d1a6

Click to view benchmark
Test Base PR %
Array append evaluation 329.4±10.99µs 327.4±2.15µs -0.61%
Array literal evaluation 192.5±1.37µs 174.8±1.76µs -9.19%
Array update evaluation 403.2±1.30µs 408.1±15.83µs +1.22%
Core + Standard library compilation 23.9±1.34ms 23.1±1.03ms -3.35%
Deutsch-Jozsa evaluation 5.1±0.06ms 5.2±0.05ms +1.96%
Large file parity evaluation 34.3±0.18ms 34.3±0.31ms 0.00%
Large input file compilation 14.2±0.52ms 14.0±0.46ms -1.41%
Large input file compilation (interpreter) 53.3±1.47ms 53.3±1.69ms 0.00%
Large nested iteration 32.2±0.27ms 32.4±0.27ms +0.62%
Perform Runtime Capabilities Analysis (RCA) on Deutsch-Jozsa sample 1612.9±132.56µs 1620.2±157.41µs +0.45%
Perform Runtime Capabilities Analysis (RCA) on large file sample 8.3±0.18ms 8.2±0.15ms -1.20%
Perform Runtime Capabilities Analysis (RCA) on teleport sample 1463.9±117.49µs 1474.3±114.19µs +0.71%
Perform Runtime Capabilities Analysis (RCA) on the core and std libraries 28.7±0.34ms 28.9±0.49ms +0.70%
Teleport evaluation 90.9±3.61µs 89.3±3.94µs -1.76%

github-actions[bot] avatar Jun 19 '24 07:06 github-actions[bot]

Benchmark for 49dcbd1

Click to view benchmark
Test Base PR %
Array append evaluation 327.5±2.20µs 329.4±6.56µs +0.58%
Array literal evaluation 174.6±1.24µs 180.9±5.15µs +3.61%
Array update evaluation 406.4±1.88µs 408.0±2.96µs +0.39%
Core + Standard library compilation 21.3±0.96ms 20.3±0.37ms -4.69%
Deutsch-Jozsa evaluation 5.1±0.05ms 5.1±0.06ms 0.00%
Large file parity evaluation 34.2±0.08ms 34.2±0.16ms 0.00%
Large input file compilation 13.3±0.31ms 12.7±0.40ms -4.51%
Large input file compilation (interpreter) 50.0±1.48ms 49.8±1.37ms -0.40%
Large nested iteration 32.9±0.94ms 32.7±0.23ms -0.61%
Perform Runtime Capabilities Analysis (RCA) on Deutsch-Jozsa sample 1623.3±138.44µs 1583.6±40.83µs -2.45%
Perform Runtime Capabilities Analysis (RCA) on large file sample 7.8±0.17ms 7.8±0.09ms 0.00%
Perform Runtime Capabilities Analysis (RCA) on teleport sample 1448.6±74.39µs 1444.1±30.77µs -0.31%
Perform Runtime Capabilities Analysis (RCA) on the core and std libraries 28.1±0.26ms 28.1±0.50ms 0.00%
Teleport evaluation 89.8±3.43µs 89.6±3.12µs -0.22%

github-actions[bot] avatar Jun 19 '24 08:06 github-actions[bot]

We'll also need a user-facing folder samples/testing when we get to adding samples of how to test Q# code. So, without looking at the code, I'm in favor of keeping Rust test code that does not target users outside of the samples folder so as not to confuse these two.

On Wed, Jun 19, 2024 at 7:43 AM Stefan J. Wernli @.***> wrote:

@.**** commented on this pull request.

On samples_test/Cargo.toml https://github.com/microsoft/qsharp/pull/1533#discussion_r1646331297:

I debated on that one. My thinking was that right now the samples folder is pretty cleanly for folks looking to find Q# samples. We link to it from other places, like QCOM. If there were Rust tests mixed in, especially slightly arcane ones like these that are generated from a build script, it makes the folder confusing for newcomers.

On a related note, I also debated about keeping these as a test under "compiler/qsc" where I had originally put them. But it occurred to me that with the way the build script is set up to watch the samples folder, any change to samples would cause qsc to be rebuilt, which is not ideal. This way there is a clearer division: samples_test depends on samples and qsc, so changes to either of those will rebuild samples_test, but qsc won't depend on the samples folder.

— Reply to this email directly, view it on GitHub https://github.com/microsoft/qsharp/pull/1533#discussion_r1646331297, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNFAADBN4W5KV3T7NRWNLTZIGKKXAVCNFSM6AAAAABHZPYGA6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCMRYGQ4DSNJVGA . You are receiving this because your review was requested.Message ID: @.***>

tcNickolas avatar Jun 19 '24 14:06 tcNickolas

Benchmark for a40e6d2

Click to view benchmark
Test Base PR %
Array append evaluation 327.5±1.39µs 330.8±2.52µs +1.01%
Array literal evaluation 189.5±2.83µs 198.7±0.87µs +4.85%
Array update evaluation 413.0±1.83µs 415.5±1.90µs +0.61%
Core + Standard library compilation 22.8±0.62ms 23.1±0.63ms +1.32%
Deutsch-Jozsa evaluation 5.1±0.07ms 5.1±0.04ms 0.00%
Large file parity evaluation 34.4±0.18ms 34.6±0.98ms +0.58%
Large input file compilation 13.6±0.34ms 14.1±0.48ms +3.68%
Large input file compilation (interpreter) 55.2±1.96ms 54.4±2.02ms -1.45%
Large nested iteration 32.1±0.28ms 32.4±0.39ms +0.93%
Perform Runtime Capabilities Analysis (RCA) on Deutsch-Jozsa sample 1612.4±126.24µs 1611.5±143.40µs -0.06%
Perform Runtime Capabilities Analysis (RCA) on large file sample 8.0±0.14ms 8.1±0.14ms +1.25%
Perform Runtime Capabilities Analysis (RCA) on teleport sample 1480.3±172.54µs 1477.8±178.45µs -0.17%
Perform Runtime Capabilities Analysis (RCA) on the core and std libraries 28.7±0.27ms 28.8±0.28ms +0.35%
Teleport evaluation 89.4±3.72µs 91.0±3.55µs +1.79%

github-actions[bot] avatar Jun 20 '24 19:06 github-actions[bot]

Benchmark for f46ca1c

Click to view benchmark
Test Base PR %
Array append evaluation 333.4±18.58µs 329.0±2.23µs -1.32%
Array literal evaluation 189.1±2.25µs 189.4±0.89µs +0.16%
Array update evaluation 413.1±7.34µs 410.7±1.58µs -0.58%
Core + Standard library compilation 21.5±1.03ms 21.2±1.42ms -1.40%
Deutsch-Jozsa evaluation 5.1±0.04ms 5.1±0.05ms 0.00%
Large file parity evaluation 34.3±0.11ms 34.4±0.64ms +0.29%
Large input file compilation 12.9±0.82ms 12.9±0.63ms 0.00%
Large input file compilation (interpreter) 51.1±2.19ms 50.7±2.74ms -0.78%
Large nested iteration 32.0±0.14ms 32.0±0.41ms 0.00%
Perform Runtime Capabilities Analysis (RCA) on Deutsch-Jozsa sample 1574.0±86.12µs 1601.2±224.70µs +1.73%
Perform Runtime Capabilities Analysis (RCA) on large file sample 8.0±0.30ms 8.2±0.36ms +2.50%
Perform Runtime Capabilities Analysis (RCA) on teleport sample 1434.1±84.56µs 1456.7±68.23µs +1.58%
Perform Runtime Capabilities Analysis (RCA) on the core and std libraries 28.5±0.75ms 28.6±0.40ms +0.35%
Teleport evaluation 90.2±3.72µs 90.0±3.42µs -0.22%

github-actions[bot] avatar Jun 25 '24 21:06 github-actions[bot]

Benchmark for 7e1ffa5

Click to view benchmark
Test Base PR %
Array append evaluation 333.7±4.75µs 331.6±2.14µs -0.63%
Array literal evaluation 189.0±0.54µs 190.1±4.44µs +0.58%
Array update evaluation 412.8±1.40µs 412.3±1.04µs -0.12%
Core + Standard library compilation 22.7±0.84ms 23.4±1.31ms +3.08%
Deutsch-Jozsa evaluation 5.1±0.05ms 5.1±0.06ms 0.00%
Large file parity evaluation 34.4±0.08ms 34.3±0.08ms -0.29%
Large input file compilation 14.3±0.70ms 14.2±0.62ms -0.70%
Large input file compilation (interpreter) 53.0±1.56ms 55.5±1.98ms +4.72%
Large nested iteration 32.4±0.93ms 32.2±0.31ms -0.62%
Perform Runtime Capabilities Analysis (RCA) on Deutsch-Jozsa sample 1622.7±152.86µs 1626.1±151.32µs +0.21%
Perform Runtime Capabilities Analysis (RCA) on large file sample 8.2±0.13ms 8.2±0.14ms 0.00%
Perform Runtime Capabilities Analysis (RCA) on teleport sample 1481.0±143.77µs 1473.7±106.74µs -0.49%
Perform Runtime Capabilities Analysis (RCA) on the core and std libraries 28.9±0.98ms 29.1±0.89ms +0.69%
Teleport evaluation 90.4±3.58µs 92.3±8.87µs +2.10%

github-actions[bot] avatar Jun 25 '24 21:06 github-actions[bot]

Great! Now maybe we can add the language samples too and get rid of the step in Python? ;)

Finally put the ideas this question triggered into a PR: https://github.com/microsoft/qsharp/pull/1739. Is the shift away from Python, qsc-based tests worth the build.rs complexity? We can let the team discuss. But at least we know it's possible!

swernli avatar Jul 15 '24 14:07 swernli