kaocha
kaocha copied to clipboard
Run tests in parallel
Very much a work in progress! Mostly opening this PR so I have a place to take and share notes.
- [x] Ensure #233 gets merged
- [ ] ~~Add some way of monitoring for debug or profiling?~~
- [x] ~~Investigate test-level parallelization~~ Add configuration for test-level parallelization
- [x] Test on several real-world test suites
- [x] Regal
- [x] deep-diff2
- [x] deja-fu
- [ ] ~~Ensure test reporters print meaningful results~~
- [ ] ~~Dots~~
- [ ] ~~Documentation~~
- [ ] Fix test
- [x] Disable based on test type, not just parent type
- [x] Add docs
- [ ] Make sure addition of volatile in
kaocha.watch
doesn't interfere with this. See #371.
A few notes:
—Somewhat unusually, this is currently rebased off of Arne's cleanup branch (#233) rather than main because I thought it might benefit from some of the cleanup work and the use of Portal.
—The basic strategy is as follows: for kaocha.type/ns
tests (basic Clojure tests), fire off run-tests
in a new thread using future
, then return the futures as normal. At the top level (the test suite), recursively search for futures and deref them. That way we essentially deref at the last possible moment (after all the tests have been kicked off but before any external code that's expecting actual test results).
—One complication is that we don't want to execute in a new thread if we're already in a worker thread. I'm not sure whether nested futures are supposed to work, but it doesn't seem to work in practice and I'm not sure it would benefit us. The approach I've taken is to modify the configuration to remove :parallel
.
—I've benchmarked this on contrived tests that favor a parallel approach (very simple tests with Thread/sleep
calls). This is mostly to sniff out bugs (a lot of attempts early this afternoon resulted in no speedup because Kaocha was waiting for each thread to finish before running the next one); real-world tests probably won't be this parallelizable.
—I cannot run the whole Kaocha test suite with :parallel
enabled. Ideally, you'd be able to enable this option and Kaocha just ignores it for tests that don't support it, but ensuring it can handle all the corner cases in our test suite may be too ambitious for the initial PR.
I did a little bit more testing and realized what I have doesn't work consistently. There seems to be a nondeterministic bug with loading certain specs or vars. I don't really understand it; my intuition is that all the loading would happen before we kick off any futures so this kind of bug shouldn't really happen.
We do some fancy dynamic loading (I think jit
falls into this category, too), and it's possible some of these techniques aren't thread safe.
Another thing to look for is places we're touching atoms or dynamic vars in code that's called within the futures. Modifying a var or atom in a secondary thread isn't inherently problematic, but it's possible we've accidentally used them in a way that makes them reliant on being modified by a single thread. (For example, two atoms won't stay in sync; you should either use one atom or add some sort of other abstraction to keep them in sync.)
Now it seems to be consistently giving me an error about not find the spec for :kaocha.type/var
.
I added some retry code. I think what's happening is that there's a race condition when running require
. ~~I suppose we could create a thread-safe version of require
~~ Turns out that serialize-require
is pretty easy to replicate, so I did that.
Did a slightly more realistic benchmark with 1000 simple io tests (simply slurp
ing README.md
).
with :parallel
:
Executed in 9.39 secs fish external
usr time 23.17 secs 0.00 micros 23.17 secs
sys time 0.57 secs 418.00 micros 0.57 secs
without :parallel
:
Executed in 67.13 secs fish external
usr time 81.38 secs 569.00 micros 81.38 secs
sys time 0.87 secs 94.00 micros 0.87 secs
And another that grabs random Wikipedia articles. (Only 25 tests, so as to not be rude.)
with :parallel
:
Executed in 5.78 secs fish external
usr time 14.03 secs 494.00 micros 14.03 secs
sys time 0.42 secs 75.00 micros 0.42 secs
without :parallel
:
Executed in 20.25 secs fish external
usr time 15.82 secs 596.00 micros 15.81 secs
sys time 0.66 secs 83.00 micros 0.66 secs
On the one hand, these aren't super revealing if you've seen people talk about the advantages of async and/or multithreading. On the other hand, reading files isn't too uncommon of a task for integration testing, so it's plausible you could see a significant improvement on that part of your test suite.
If you have a test suite that would benefit from parallelization, I'd be interested to hear from you. What tests do you have? When it's a little more polished, would you be willing to test or benchmark parallelized test runs?
It's easy enough to document how to turn on parallelized test run (set :parallel to true in tests.edn), but I think for this feature to be useful, the documentation will also have to include information on what kinds of tests benefit from it and what kinds of common tests aren't thread-safe. I'm hoping to hear from Kaocha users to help research this part of the documentation.
Having real-world tests will also let me determine whether parallelizing at the namespace level is enough (current approach) or if we need to consider working at the test level.
I'll be able to test and benchmark in a few days. The primary reason I'd like to see test parallelization is to speed up my suite of test.check
tests. Parallelization at the per-test level would have the largest impact on these, but even per-namespace should provide a speed bump. Really excited to see your work on this, thank you!
@nwjsmith You're welcome! I somehow blanked on test.check
as a use case even though I'm a big fan of property-based testing and we use it on some of our projects, e.g., Regal. I believe test.check
tests should work already (because they're gathered into namespace testables, which are parallelized), but enabling test.check
tests at the test level would be another step because they're a different type.
Two updates:
You can now use the :parallel-children-exclude
key to ensure that any children will not be further parallelized.
Setting a type of test whose children shouldn't be parallelized probably isn't the configuration we ultimately want but is easier to implement because currently we decide whether to parallelize something when we're given a bunch of testables. However, it does let you choose between tests at the namespace level (add :kaocha.type/ns
) and those at the test level (leave it blank).
Regal's test suite in parallel passes
Our first real-world performance test:
↪ hyperfine 'clojure -A:test -m kaocha.runner clj' 'clojure -A:test-local -m kaocha.runner clj'
Benchmark #1: clojure -A:test -m kaocha.runner clj
Time (mean ± σ): 43.235 s ± 6.081 s [User: 76.913 s, System: 2.508 s]
Range (min … max): 32.442 s … 54.122 s 10 runs
Benchmark #2: clojure -A:test-local -m kaocha.runner clj
Time (mean ± σ): 40.678 s ± 11.089 s [User: 83.026 s, System: 2.449 s]
Range (min … max): 27.970 s … 57.834 s 10 runs
Summary
'clojure -A:test-local -m kaocha.runner clj' ran
1.06 ± 0.33 times faster than 'clojure -A:test -m kaocha.runner clj'
(test-local
uses a local copy of Kaocha with the parallelize
branch checked out.)
The previous test had high variance so I quit some resource-intensive programs and set the tests to use a fixed seed
And with variance addresed, we get a 10 percent speed up:
↪ hyperfine 'clojure -A:test -m kaocha.runner clj' 'clojure -A:test-local -m kaocha.runner clj'
Benchmark #1: clojure -A:test -m kaocha.runner clj
Time (mean ± σ): 43.979 s ± 0.905 s [User: 80.840 s, System: 2.393 s]
Range (min … max): 42.434 s … 45.172 s 10 runs
Benchmark #2: clojure -A:test-local -m kaocha.runner clj
Time (mean ± σ): 39.900 s ± 1.060 s [User: 84.105 s, System: 2.453 s]
Range (min … max): 37.989 s … 41.194 s 10 runs
Summary
'clojure -A:test-local -m kaocha.runner clj' ran
1.10 ± 0.04 times faster than 'clojure -A:test -m kaocha.runner clj'
Regal isn't the ideal example for this. Even though it has a time-consuming test suite, the majority of its time is taken up by a single test. Even though the test is a property based test that really consists of running the same simple test over a number of random inputs, from Kaocha's perspective, it's a single test.
I did some more tests, and parallelization's running fine! :tada: Tests consistently show parallelization is faster (although sometimes the confidence interval overlaps with it being slower), if only a little bit faster. This is actually really good news—it means that we are not creating so much overhead that it is likely to slow down tests. I'm still not planning on turning it on by default, but it mean that people don't necessarily have to do a ton of benchmarking to ensure it won't make things worse for their test suite.
I noticed some cases where parallelization does indeed add overhead (I think tests ran 5 to 10 percent slower?), so I spent some time investigating. Last week, I did some benchmarking with Criterium and today I did some profiling/sampling with VisualVM to see whether part of our implementation is adding overhead and could be optimized. Profiling was a dead end (too slow), but sampling revealed that one of the methods the most time spent in was java.lang.Thread.<init>
. A lot of time was spent in java.util.concurrent.AbstractExecutorService.submit ()
, too.
I think I should put this on hold because it could easily become a rabbit hole.
I wonder if one possible alternative solution would be to support something like https://www.juxt.pro/blog/parallel-kaocha So circleci allows you to run tests in parallel, just by running a different subset of tests in every different container. The more tricky thing circleci does is that it merges back the test results and present the results like tests were run normally.
Maybe supporting something like this in kaocha directly would be easier and not have the same Threading performance issues?
@AndreaCrotti That's a clever solution. What changes were you thinking to make? Adding the circleci-parallel
namespace code to Kaocha itself?
While I am interested in that solution, I plan to go ahead with the parallel option.
So well @alysbrooks, I guess there are three things required for this to work:
- some way to group tests in a way that each group takes more or less the same time (which circleci does for you with from the past timing information)
- some code to then run in parallel multiple kaochas with the various groups of namespaces
- some code to get the results from all the kaochas and give a unified output.
Maybe only 3. could be done directly in kaocha, but probably the whole thing could be done with a plugin ideally. For example, you could:
- run all your tests without parallelization and record the timings
- group and run all the kaochas and report everything.
One issue with the approach in this PR is that every test suite with database (or similar) writes will probably need quite a bit of adjusting to run in parallel. This other approach I was suggesting in theory has the same issue, but if you parameterize your test resources configuration, you could also be able to just use a totally different test infrastructure for each kaocha run, and avoid the problem entirely (and it would also avoid the Threading overhead issue as well).
So well in short I guess it would be great to have this PR merged, but maybe I can investigate the other way to parallelize tests as a separate plugin as a possible alternative strategy.
@AndreaCrotti Interesting. I could see 2 being done with a Babashka script that we ship, and I think 3 could be a plugin, in theory. They'd be stretching the idea of plugins because the instance of Kaocha doing the combining wouldn't necessarily run any tests, it would just grab the test results and inserts them so the reporters could summarize them. Actually, a plugin that takes saved test results and feeds them back into Kaocha as though those tests just ran could be pretty interesting.
Threw together a simple graph of the Regal results:
I'm interested in trying this in our CI. Is a snapshot published for this branch?
I'm interested in trying this in our CI. Is a snapshot published for this branch?
You can just point to a given sha with deps.edn and git paths?
I'm interested in trying this in our CI. Is a snapshot published for this branch?
You can just point to a given sha with deps.edn and git paths?
I'll see if Leiningen supports that next week, when I'm back at work. Or perhaps a hacky migration to deps.edn for the purpose of testing this is possible...? I'll have to evaluate that.
I'm interested in trying this in our CI. Is a snapshot published for this branch?
You can just point to a given sha with deps.edn and git paths?
I'll see if Leiningen supports that next week, when I'm back at work. Or perhaps a hacky migration to deps.edn for the purpose of testing this is possible...? I'll have to evaluate that.
Leiningen doesn’t support it, as far as I remember, but the functionality is available as a plug-in.
It seems like the use of with-redefs
in kaocha.plugin.capture-output/with-capture
does not fit tests running in parallel. Could you provide some reasoning for the current implementation? Seems like it would target missing thread-binding conveyance. Also, (run! f @active-buffers)
in kaocha.plugin.capture-output/doto-capture-buffer
leads to some confusing output when tests run in parallel. Could you provide some reasoning for the current implementation here as well? The first thing that came to my mind was thread-binding conveyance again.
It seems like the use of
with-redefs
inkaocha.plugin.capture-output/with-capture
does not fit tests running in parallel... Also,(run! f @active-buffers)
inkaocha.plugin.capture-output/doto-capture-buffer
leads to some confusing output when tests run in parallel.
This plugin doesn't really work well with parallelization because it was written before we introduced this feature and hasn't been updated yet. I don't think the design of kaocha.plugin.capture-output
had threading in mind.
This plugin doesn't really work well with parallelization because it was written before we introduced this feature and hasn't been updated yet. I don't think the design of kaocha.plugin.capture-output had threading in mind.
And yet the plugin is loaded by default, right? I do think the output is much worse without it though. Could Arne weigh in maybe, @plexus?
As you can see this us still an open PR, we appreciate people testing it out and providing feedback. Most of the Kaocha codebase predates parallelization by several years, and so there will likely be more areas where we discover things don't yet play well together.
Suggestions on how to rework the plugin to work better with this branch, or perhaps a PR, are very welcome.
Could Arne weigh in maybe, @plexus?
Alys knows what she is doing here, no need to tag people in like this. All issues are monitored and discussed within our team.
So what is the reasoning for the current implementation, (i) the use of with-redefs
and the (ii) active-buffers
construct?
@buhhh you can see from the commit history that the capture-output
plugin is based on eftest's implementation of output capturing. Both active-buffers
and the with-redefs
are inherited from there.
@nwjsmith Thanks for the hint. Now as far as I have understood Eftest is multi-threaded by default suggesting the implementation is in fact intended for this situation. I have found some reasoning here github.com/weavejester/eftest/commit/c920b5223e3c458b1443404974fe9ca3b76b6381. This seems to confirm that the main motivation are threads without thread-binding conveyance, but I might miss something. I would actually not be sure if this should be accounted for in output capturing at all. So, in my eyes, one could maybe do without (i) the with-redefs
and (ii) active-buffers
– possibly collecting the output somewhere else instead of sending it to all 'active' capture buffers.