sxt-proof-of-sql icon indicating copy to clipboard operation
sxt-proof-of-sql copied to clipboard

test: improve test performance

Open JayWhite2357 opened this issue 1 year ago • 17 comments

Background and Motivation

Currently, running all the tests takes a long time: almost 15 min on my local machine.

Changes Required

Reduce the test times, without removing or weakening any tests.

EDIT: More broadly, any improvement to the overall CI runtimes are welcome changes and will qualify for the bounty. This ultimately the main goal of this issue, with a secondary goal being faster local development work.

  • [ ] The most obvious and easiest way to improve the performance is by caching the Dory setups. I believe (although have not confirmed) that PublicParameters::test_rand, ProverSetup::from, and VerifierSetup::from are taking up over half of the entire test suite time.
    • This could be done by either saving test setups to a file if they are not found, by simply committing test versions to the repo, or some other approach. I am open to creative solutions.

Below is a the output of cargo nextest run -j1 --all-features | grep -v "\[ ".

    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.14s
────────────
 Nextest run ID 045165c7-1b36-4116-94aa-fb64511d54e7 with nextest profile: default
    Starting 1506 tests across 7 binaries (2 tests skipped)
        PASS [  14.648s] proof-of-sql proof_primitive::dory::dory_commitment_evaluation_proof_test::test_random_ipa_with_length_1
        PASS [  57.824s] proof-of-sql proof_primitive::dory::dory_commitment_evaluation_proof_test::test_random_ipa_with_various_lengths
        PASS [  14.638s] proof-of-sql proof_primitive::dory::dory_commitment_evaluation_proof_test::test_simple_ipa
        PASS [  23.490s] proof-of-sql proof_primitive::dory::dynamic_dory_commitment_evaluation_proof_test::test_random_ipa_with_various_lengths
        PASS [  10.478s] proof-of-sql proof_primitive::dory::setup_test::we_can_create_prover_setups_with_various_sizes
        PASS [  10.288s] proof-of-sql sql::proof_exprs::inequality_expr_test::we_can_compare_columns_with_extreme_values
        PASS [  13.511s] proof-of-sql sql::proof_plans::sort_merge_join_exec_test::we_can_prove_and_get_the_correct_result_from_a_complex_query_involving_two_sort_merge_joins
────────────
     Summary [ 814.675s] 1506 tests run: 1506 passed, 2 skipped

JayWhite2357 avatar Feb 18 '25 14:02 JayWhite2357

/bounty $100

I will give at least $100 for cutting the total test time in half. In other words, on my machine:

  • ~7.5 min -> $100
  • ~3.75 min -> $200
  • ~1.875 min -> $300

Within reason, I'll payout $\$100\cdot\log_2\left(\frac{T}{t}\right)$ where $T$ is the time it takes to run all tests on main, and $t$ is the time it takes to run all test on the PR submitted on my local machine.

JayWhite2357 avatar Feb 19 '25 12:02 JayWhite2357

💎 $100 bounty • Space and Time

Steps to solve:

  1. Start working: (Optional) Comment /attempt #557 with your implementation plan. Note: we will only assign an issue if you include an implementation plan with a time estimate. Additionally, to be assigned an issue, you must have previously contributed to the project. You can still work on an issue and submit a PR without being assigned.
  2. Submit work: Create a pull request including /claim #557 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to spaceandtimefdn/sxt-proof-of-sql!

Add a bountyShare on socials

Attempt Started (UTC) Solution Actions
🟢 @Abiji-2020 May 04, 2025, 11:48:53 PM WIP
🟢 @zelosleone Mar 05, 2025, 01:54:43 PM WIP
🟢 @deekshatomer Feb 20, 2025, 05:10:50 PM WIP

algora-pbc[bot] avatar Feb 19 '25 12:02 algora-pbc[bot]

/attempt #557

Options

deekshatomer avatar Feb 20 '25 17:02 deekshatomer

/attempt #557

Algora profile Completed bounties Tech Active attempts Options
@zelosleone 1 bounty from 1 project
Rust, Scala,
JavaScript & more
﹟183
Cancel attempt

zelosleone avatar Mar 05 '25 13:03 zelosleone

@JayWhite2357 Honestly, this an extremely frustatin bounty. In the first place, i don't think you guys tested the idea of optimizing the test results yet. Caching will only speed up things up to 2-4 seconds, and thats only for several tests. The actual bottleneck is rng calculations, which cannot be changed without altering the test logic. Which is against the bounty rules in the first place. I spent 2 full days on this just to find out, there is not much change will happen. Let me explain some solutions you guys can use.

1- Adding opt level:

Firstly I suggest:

[profile.test]
opt-level = 3

Adding this to root Cargo.toml, this alone took a 120 seconds dory test to 20 seconds, that's already pretty good. But it won't affect the total runtime, because nextest will make tests fight with each other for resources!

2- Adding Nextest configuration:

[store]
dir = "target/nextest"

[profile.default]
default-filter = "all()"
retries = 0
threads-required = 1

[[profile.default.overrides]]
filter = 'test(/proof_primitive::dory::/)'
threads-required = 16
priority = -100

[[profile.default.overrides]]
filter = 'test(/proof-of-sql::integration_tests/)'
threads-required = 16
priority = -70

[[profile.default.overrides]]
filter = 'test(/sql::proof_exprs::/)'
threads-required = 16
priority = -80

[[profile.default.overrides]]
filter = 'test(/sql::proof_gadgets::/)'
threads-required = 16
priority = -90

You can also add a configuration like that, tried multiple settings with it but nothing changes except leaving the hard tests to the end. This is pointless unless you want the illusion of "tests run fast!"

3- Caching and Parallelism

Rayon is not configurated to utilize thread pool in the codebase, but even doing so won't matter because nextest will use its own thread pooling, which won't make sense because caching won't work for other tests in this configuration as well. You can check https://github.com/nextest-rs/nextest/issues/27 here for more information, running the tests in the same processes would save us a lot of time (possibly) which would make the shared cache actually work. But even that doesn't work properly honestly. Until then, cache will only save us around 2-4 seconds which is not even worth the effort.

4- Running the tests with --release flag

This is also like opt level will increase run time in a huge way. Rust compiler will optimize the code itself.

It's frustating that i spent 2 days on this just to find out nextest is a huge let down, and this bounty is not tested throughly to be an actual valid bounty. Just run the tests with --release flag honestly. That alone will cut x3 of the time of single test runtimes.

zelosleone avatar Mar 06 '25 17:03 zelosleone

Let me clarify a couple things that hopefully make this less frustrating.

  • The motivation behind this issue is the fact that CI takes a long time to run. The test job took 17m 57s and the code coverage job took 28m 27s to run. If you can improve this, I'll reward the bounty. The more you improve it, the more I'll reward.
  • I used nextest simply to show which tests were long running. We don't currently use nextest within our CI. I was hoping that it would also be an objective measurement, but perhaps I was wrong.
  • I said "without removing or weakening any tests". This doesn't mean the tests can't be modified. You can absolutely modify the tests. You just can't weaken them. In other words, they still need to test the same thing they were testing before. (i.e. rather then generating new setups with the rng every test, just read it from a file.)

JayWhite2357 avatar Mar 06 '25 21:03 JayWhite2357

@zelosleone Responding to each of your points: 1 - If opt-level = 3 will make the CI faster, awesome. Put in a 2 line PR. 2 - If switching to nextest and adding a configuration is faster than cargo test, that counts too. 3 - when I say "caching", I mean: have each test attempt to read from a file. If that fails, generate the setup and write to that file. Then, subsequent test will be able to read the file and not use rng at all. 4 - I would be curious to know the actual different for --release vs just opt-level = 3. I'm not completely opposed to --release, but there are things like dbg_assert! that don't work with --release.

It sounds like you have done all the right things. Put some PRs in and we'll take a look at the CI timings.

JayWhite2357 avatar Mar 06 '25 22:03 JayWhite2357

💡 @zelosleone submitted a pull request that claims the bounty. You can visit your bounty board to reward.

algora-pbc[bot] avatar Mar 06 '25 22:03 algora-pbc[bot]

@zelosleone Responding to each of your points: 1 - If opt-level = 3 will make the CI faster, awesome. Put in a 2 line PR. 2 - If switching to nextest and adding a configuration is faster than cargo test, that counts too. 3 - when I say "caching", I mean: have each test attempt to read from a file. If that fails, generate the setup and write to that file. Then, subsequent test will be able to read the file and not use rng at all. 4 - I would be curious to know the actual different for --release vs just opt-level = 3. I'm not completely opposed to --release, but there are things like dbg_assert! that don't work with --release.

It sounds like you have done all the right things. Put some PRs in and we'll take a look at the CI timings.

Alright, I opened the PR, in my machine it cut from 21 minutes to 14 minutes. So I would imagine a same around improvement in your machine as well. Release flag disables debug, but opt level doesn't. Imo release flag also adds some lto flags and makes some changes for codegen units but from my testing, they do not improve anything related to test runtime. Only opt level 3 proves good improvement.

zelosleone avatar Mar 06 '25 22:03 zelosleone

@zelosleone I'm really sorry that I have to undo the opt-level change since it has made CI slower, not faster.

In https://github.com/spaceandtimelabs/sxt-proof-of-sql/actions/workflows/lint-and-test.yml you can see that tests have actually become considerably slower. So local performance don't really correlate with CI performance that well.

iajoiner avatar Mar 07 '25 01:03 iajoiner

@zelosleone I'm really sorry that I have to undo the opt-level change since it has made CI slower, not faster.

In spaceandtimelabs/sxt-proof-of-sql/actions/workflows/lint-and-test.yml you can see that tests have actually become considerably slower. So local performance don't really correlate with CI performance that well.

Thats only because compile time increased to allow faster runtime, since code optimization via opt-level is done during compiling.

zelosleone avatar Mar 07 '25 01:03 zelosleone

Right. Part of this may be due to the fact that the CI rebuilds all the dependencies from scratch every time it runs. I think we try to cache the build so it doesn't, but something is messed up with that. I bet if the CI was improved to take proper advantage of caching, the optimization would be less of a slowdown on the build and might speed up the overall job.

JayWhite2357 avatar Mar 07 '25 02:03 JayWhite2357

One thing I am noticing is that tests appear to be running twice, once in code coverage and once as normal. Do we need both runs?

Dustin-Ray avatar Mar 07 '25 04:03 Dustin-Ray

One thing I am noticing is that tests appear to be running twice, once in code coverage and once as normal. Do we need both runs?

Thats not the issue... Total workflow time SHOULD be irrelevant, if you reduce test time it will reduce in total time too imo

zelosleone avatar Mar 07 '25 05:03 zelosleone

@JayWhite2357 @iajoiner @Dustin-Ray Sorry for pinging, but I wanted to let you guys in with some additional research on this. I will talk about some issues, some potentials and some things i have tried that basically which did nothing.

1- Cloning issues

Codebase is doing a lot of cloning during operations, specially dory, that can actually result in slow performance since we could get by with just referencing to it. However, i have tried completely switching the cloning to reference of dory tests and setups that it matters less than i think. We can honestly eliminate this.

2- Using bytemuck

This is actually a problem, because actually we are using unsafe code, it just doesn't show directly. Bytemuck is full of unsafe code which we could rewrite or use something safer/more performance friendly alternatives. Once i tried to write a few files that use bytemuck myself to use proper unsafe code instead, although some tests broke (i was just testing the speed) it actually speed up just a little bit. I don't think there will be a massive gain though, since bytemuck is not used that much through the codebase. But still, it would be nice to have this as To-Do for the future "We should aim to drop bytemuck" or something like that.

3- Caching of test setup really shouldn't be done to a hardcoded file, temporary files are better for this. I will post a relevant PR that takes care of github actions caching, could be better than the current one but won't claim for this issue.

4- rayon thread pool could be increased to 12 at least, that could "sort of" maybe, maybe increase the speed but a huge but since we are using parallelization, if we do not manage it well resources are just going to be fought over. I have tried every single combination i can think of to increase runtime speed with just thread pooling and parallelization but i didn't achieve much improvement in speed.

5- I think the best point is using c++ (via rust bindings) for mathematical and cryptological calculations though. https://eshard.com/posts/Rust-Cxx-interop some articles using c++ and rust together are online, I didn't try this but considering c++ runtime is A LOT faster than rust when it comes to computing, it could potentially greatly benefit the runtime speed. Ark doesn't seem that fast, though in cargo.toml we still didn't include asm feature flag (in their discord, they basically feature it to improve speed) but i still didn't achieve much improvement in speed even with that feature flag. using c++ and rust together could be great though.

6- I am not sure if its possible, since i also didn't try this but using WASM when we could also could greatly increase speed. Since its sandboxed in the first place, maybe cryptology calculations and hard-computings can be done in WASM via c++ etc. It's supposed to be fast afterall and they have great rust support via bindings. I don't think there will be memory issues either, might be good to take a look.

zelosleone avatar Mar 10 '25 11:03 zelosleone

@zelosleone

Sorry for pinging, but I wanted to let you guys in with some additional research on this. I will talk about some issues, some potentials and some things i have tried that basically which did nothing.

1- Cloning issues

Codebase is doing a lot of cloning during operations, specially dory, that can actually result in slow performance since we could get by with just referencing to it. However, i have tried completely switching the cloning to reference of dory tests and setups that it matters less than i think. We can honestly eliminate this.

I agree with reducing clones as much as possible. There is some tech debt in this area that would be nice if it was cleaned up.

2- Using bytemuck

This is actually a problem, because actually we are using unsafe code, it just doesn't show directly. Bytemuck is full of unsafe code which we could rewrite or use something safer/more performance friendly alternatives. Once i tried to write a few files that use bytemuck myself to use proper unsafe code instead, although some tests broke (i was just testing the speed) it actually speed up just a little bit. I don't think there will be a massive gain though, since bytemuck is not used that much through the codebase. But still, it would be nice to have this as To-Do for the future "We should aim to drop bytemuck" or something like that.

I don't agree. bytemuck is quite safe/performant. Even though it uses unsafe code, it does it in a way that ensures nothing will break.

3- Caching of test setup really shouldn't be done to a hardcoded file, temporary files are better for this. I will post a relevant PR that takes care of github actions caching, could be better than the current one but won't claim for this issue.

I don't care too much the method. Temporary files are nice, but I'm also not opposed to checking in a hardcoded file that is purely for testing purposes. That hardcoded file could be validated by a test, for example, to ensure it isn't wrong.

4- rayon thread pool could be increased to 12 at least, that could "sort of" maybe, maybe increase the speed but a huge but since we are using parallelization, if we do not manage it well resources are just going to be fought over. I have tried every single combination i can think of to increase runtime speed with just thread pooling and parallelization but i didn't achieve much improvement in speed.

I'm pretty sure the rayon thread pool size is set by rayon automatically to be the number of CPUs. Probably the reason you didn't see an improvement is because rayon is very optimized.

5- I think the best point is using c++ (via rust bindings) for mathematical and cryptological calculations though. https://eshard.com/posts/Rust-Cxx-interop some articles using c++ and rust together are online, I didn't try this but considering c++ runtime is A LOT faster than rust when it comes to computing, it could potentially greatly benefit the runtime speed. Ark doesn't seem that fast, though in cargo.toml we still didn't include asm feature flag (in their discord, they basically feature it to improve speed) but i still didn't achieve much improvement in speed even with that feature flag. using c++ and rust together could be great though.

We won't be using C++ for a variety of reasons, although blitzar is written in C++ and does our GPU acceleration. We include the asm feature flag in the perf feature, which is enabled by default.

6- I am not sure if its possible, since i also didn't try this but using WASM when we could also could greatly increase speed. Since its sandboxed in the first place, maybe cryptology calculations and hard-computings can be done in WASM via c++ etc. It's supposed to be fast afterall and they have great rust support via bindings. I don't think there will be memory issues either, might be good to take a look.

We do currently support the WASM target.

JayWhite2357 avatar Mar 11 '25 04:03 JayWhite2357

/attempt #557

Abiji-2020 avatar May 04 '25 23:05 Abiji-2020