kaocha Run tests in parallel

Very much a work in progress! Mostly opening this PR so I have a place to take and share notes.

[x] Ensure #233 gets merged
[ ] ~~Add some way of monitoring for debug or profiling?~~
[x] ~~Investigate test-level parallelization~~ Add configuration for test-level parallelization
[x] Test on several real-world test suites
- [x] Regal
- [x] deep-diff2
- [x] deja-fu
[ ] ~~Ensure test reporters print meaningful results~~
- [ ] ~~Dots~~
- [ ] ~~Documentation~~
[ ] Fix test
[x] Disable based on test type, not just parent type
[x] Add docs
[ ] Make sure addition of volatile in kaocha.watch doesn't interfere with this. See #371.

Jul 21 '21 18:07 alysbrooks

A few notes: —Somewhat unusually, this is currently rebased off of Arne's cleanup branch (#233) rather than main because I thought it might benefit from some of the cleanup work and the use of Portal. —The basic strategy is as follows: for kaocha.type/ns tests (basic Clojure tests), fire off run-tests in a new thread using future, then return the futures as normal. At the top level (the test suite), recursively search for futures and deref them. That way we essentially deref at the last possible moment (after all the tests have been kicked off but before any external code that's expecting actual test results). —One complication is that we don't want to execute in a new thread if we're already in a worker thread. I'm not sure whether nested futures are supposed to work, but it doesn't seem to work in practice and I'm not sure it would benefit us. The approach I've taken is to modify the configuration to remove :parallel. —I've benchmarked this on contrived tests that favor a parallel approach (very simple tests with Thread/sleep calls). This is mostly to sniff out bugs (a lot of attempts early this afternoon resulted in no speedup because Kaocha was waiting for each thread to finish before running the next one); real-world tests probably won't be this parallelizable. —I cannot run the whole Kaocha test suite with :parallel enabled. Ideally, you'd be able to enable this option and Kaocha just ignores it for tests that don't support it, but ensuring it can handle all the corner cases in our test suite may be too ambitious for the initial PR.

Jul 21 '21 22:07 alysbrooks

I did a little bit more testing and realized what I have doesn't work consistently. There seems to be a nondeterministic bug with loading certain specs or vars. I don't really understand it; my intuition is that all the loading would happen before we kick off any futures so this kind of bug shouldn't really happen.

We do some fancy dynamic loading (I think jit falls into this category, too), and it's possible some of these techniques aren't thread safe.

Jul 21 '21 23:07 alysbrooks

Another thing to look for is places we're touching atoms or dynamic vars in code that's called within the futures. Modifying a var or atom in a secondary thread isn't inherently problematic, but it's possible we've accidentally used them in a way that makes them reliant on being modified by a single thread. (For example, two atoms won't stay in sync; you should either use one atom or add some sort of other abstraction to keep them in sync.)

Jul 22 '21 00:07 alysbrooks

Now it seems to be consistently giving me an error about not find the spec for :kaocha.type/var.

Aug 27 '21 04:08 alysbrooks

I added some retry code. I think what's happening is that there's a race condition when running require. ~~I suppose we could create a thread-safe version of require~~ Turns out that serialize-require is pretty easy to replicate, so I did that.

Aug 27 '21 05:08 alysbrooks

Did a slightly more realistic benchmark with 1000 simple io tests (simply slurping README.md).

with :parallel:

Executed in    9.39 secs   fish           external 
   usr time   23.17 secs    0.00 micros   23.17 secs 
   sys time    0.57 secs  418.00 micros    0.57 secs

without :parallel:

Executed in   67.13 secs   fish           external 
   usr time   81.38 secs  569.00 micros   81.38 secs 
   sys time    0.87 secs   94.00 micros    0.87 secs

And another that grabs random Wikipedia articles. (Only 25 tests, so as to not be rude.)

with :parallel:

Executed in    5.78 secs   fish           external 
   usr time   14.03 secs  494.00 micros   14.03 secs 
   sys time    0.42 secs   75.00 micros    0.42 secs

without :parallel:

Executed in   20.25 secs   fish           external 
   usr time   15.82 secs  596.00 micros   15.81 secs 
   sys time    0.66 secs   83.00 micros    0.66 secs

On the one hand, these aren't super revealing if you've seen people talk about the advantages of async and/or multithreading. On the other hand, reading files isn't too uncommon of a task for integration testing, so it's plausible you could see a significant improvement on that part of your test suite.

Aug 27 '21 06:08 alysbrooks

If you have a test suite that would benefit from parallelization, I'd be interested to hear from you. What tests do you have? When it's a little more polished, would you be willing to test or benchmark parallelized test runs?

It's easy enough to document how to turn on parallelized test run (set :parallel to true in tests.edn), but I think for this feature to be useful, the documentation will also have to include information on what kinds of tests benefit from it and what kinds of common tests aren't thread-safe. I'm hoping to hear from Kaocha users to help research this part of the documentation.

Having real-world tests will also let me determine whether parallelizing at the namespace level is enough (current approach) or if we need to consider working at the test level.

Aug 28 '21 01:08 alysbrooks

I'll be able to test and benchmark in a few days. The primary reason I'd like to see test parallelization is to speed up my suite of test.check tests. Parallelization at the per-test level would have the largest impact on these, but even per-namespace should provide a speed bump. Really excited to see your work on this, thank you!

Aug 28 '21 14:08 nwjsmith

@nwjsmith You're welcome! I somehow blanked on test.check as a use case even though I'm a big fan of property-based testing and we use it on some of our projects, e.g., Regal. I believe test.check tests should work already (because they're gathered into namespace testables, which are parallelized), but enabling test.check tests at the test level would be another step because they're a different type.

Aug 30 '21 23:08 alysbrooks

Two updates:

You can now use the `:parallel-children-exclude` key to ensure that any children will not be further parallelized.

Setting a type of test whose children shouldn't be parallelized probably isn't the configuration we ultimately want but is easier to implement because currently we decide whether to parallelize something when we're given a bunch of testables. However, it does let you choose between tests at the namespace level (add :kaocha.type/ns) and those at the test level (leave it blank).

Regal's test suite in parallel passes

Our first real-world performance test:

↪ hyperfine 'clojure -A:test -m kaocha.runner clj' 'clojure -A:test-local -m kaocha.runner clj'
Benchmark #1: clojure -A:test -m kaocha.runner clj
  Time (mean ± σ):     43.235 s ±  6.081 s    [User: 76.913 s, System: 2.508 s]
  Range (min … max):   32.442 s … 54.122 s    10 runs
 
Benchmark #2: clojure -A:test-local -m kaocha.runner clj
  Time (mean ± σ):     40.678 s ± 11.089 s    [User: 83.026 s, System: 2.449 s]
  Range (min … max):   27.970 s … 57.834 s    10 runs
 
Summary
  'clojure -A:test-local -m kaocha.runner clj' ran
    1.06 ± 0.33 times faster than 'clojure -A:test -m kaocha.runner clj'

(test-local uses a local copy of Kaocha with the parallelize branch checked out.)

Oct 19 '21 23:10 alysbrooks

The previous test had high variance so I quit some resource-intensive programs and set the tests to use a fixed seed

And with variance addresed, we get a 10 percent speed up:

↪ hyperfine 'clojure -A:test -m kaocha.runner clj' 'clojure -A:test-local -m kaocha.runner clj'
Benchmark #1: clojure -A:test -m kaocha.runner clj
  Time (mean ± σ):     43.979 s ±  0.905 s    [User: 80.840 s, System: 2.393 s]
  Range (min … max):   42.434 s … 45.172 s    10 runs
 
Benchmark #2: clojure -A:test-local -m kaocha.runner clj
  Time (mean ± σ):     39.900 s ±  1.060 s    [User: 84.105 s, System: 2.453 s]
  Range (min … max):   37.989 s … 41.194 s    10 runs
 
Summary
  'clojure -A:test-local -m kaocha.runner clj' ran
    1.10 ± 0.04 times faster than 'clojure -A:test -m kaocha.runner clj'

Regal isn't the ideal example for this. Even though it has a time-consuming test suite, the majority of its time is taken up by a single test. Even though the test is a property based test that really consists of running the same simple test over a number of random inputs, from Kaocha's perspective, it's a single test.

Oct 20 '21 00:10 alysbrooks

I did some more tests, and parallelization's running fine! :tada: Tests consistently show parallelization is faster (although sometimes the confidence interval overlaps with it being slower), if only a little bit faster. This is actually really good news—it means that we are not creating so much overhead that it is likely to slow down tests. I'm still not planning on turning it on by default, but it mean that people don't necessarily have to do a ton of benchmarking to ensure it won't make things worse for their test suite.

Oct 20 '21 18:10 alysbrooks

I noticed some cases where parallelization does indeed add overhead (I think tests ran 5 to 10 percent slower?), so I spent some time investigating. Last week, I did some benchmarking with Criterium and today I did some profiling/sampling with VisualVM to see whether part of our implementation is adding overhead and could be optimized. Profiling was a dead end (too slow), but sampling revealed that one of the methods the most time spent in was java.lang.Thread.<init>. A lot of time was spent in java.util.concurrent.AbstractExecutorService.submit (), too.

I think I should put this on hold because it could easily become a rabbit hole.

Oct 25 '21 19:10 alysbrooks

I wonder if one possible alternative solution would be to support something like https://www.juxt.pro/blog/parallel-kaocha So circleci allows you to run tests in parallel, just by running a different subset of tests in every different container. The more tricky thing circleci does is that it merges back the test results and present the results like tests were run normally.

Maybe supporting something like this in kaocha directly would be easier and not have the same Threading performance issues?

Jan 06 '22 15:01 AndreaCrotti

@AndreaCrotti That's a clever solution. What changes were you thinking to make? Adding the circleci-parallel namespace code to Kaocha itself?

While I am interested in that solution, I plan to go ahead with the parallel option.

Jan 08 '22 01:01 alysbrooks

So well @alysbrooks, I guess there are three things required for this to work:

some way to group tests in a way that each group takes more or less the same time (which circleci does for you with from the past timing information)
some code to then run in parallel multiple kaochas with the various groups of namespaces
some code to get the results from all the kaochas and give a unified output.

Maybe only 3. could be done directly in kaocha, but probably the whole thing could be done with a plugin ideally. For example, you could:

run all your tests without parallelization and record the timings
group and run all the kaochas and report everything.

One issue with the approach in this PR is that every test suite with database (or similar) writes will probably need quite a bit of adjusting to run in parallel. This other approach I was suggesting in theory has the same issue, but if you parameterize your test resources configuration, you could also be able to just use a totally different test infrastructure for each kaocha run, and avoid the problem entirely (and it would also avoid the Threading overhead issue as well).

Jan 11 '22 14:01 AndreaCrotti

So well in short I guess it would be great to have this PR merged, but maybe I can investigate the other way to parallelize tests as a separate plugin as a possible alternative strategy.

Jan 11 '22 14:01 AndreaCrotti

@AndreaCrotti Interesting. I could see 2 being done with a Babashka script that we ship, and I think 3 could be a plugin, in theory. They'd be stretching the idea of plugins because the instance of Kaocha doing the combining wouldn't necessarily run any tests, it would just grab the test results and inserts them so the reporters could summarize them. Actually, a plugin that takes saved test results and feeds them back into Kaocha as though those tests just ran could be pretty interesting.

Jan 11 '22 23:01 alysbrooks

Threw together a simple graph of the Regal results:

20211_01_11 Regal Benchmarking

Jan 12 '22 04:01 alysbrooks

I'm interested in trying this in our CI. Is a snapshot published for this branch?

Apr 14 '22 10:04 stig

I'm interested in trying this in our CI. Is a snapshot published for this branch?

You can just point to a given sha with deps.edn and git paths?

Apr 14 '22 15:04 AndreaCrotti

I'm interested in trying this in our CI. Is a snapshot published for this branch?

You can just point to a given sha with deps.edn and git paths?

I'll see if Leiningen supports that next week, when I'm back at work. Or perhaps a hacky migration to deps.edn for the purpose of testing this is possible...? I'll have to evaluate that.

Apr 17 '22 18:04 stig

I'm interested in trying this in our CI. Is a snapshot published for this branch?

You can just point to a given sha with deps.edn and git paths?

I'll see if Leiningen supports that next week, when I'm back at work. Or perhaps a hacky migration to deps.edn for the purpose of testing this is possible...? I'll have to evaluate that.

Leiningen doesn’t support it, as far as I remember, but the functionality is available as a plug-in.

Apr 18 '22 15:04 alysbrooks

It seems like the use of with-redefs in kaocha.plugin.capture-output/with-capture does not fit tests running in parallel. Could you provide some reasoning for the current implementation? Seems like it would target missing thread-binding conveyance. Also, (run! f @active-buffers) in kaocha.plugin.capture-output/doto-capture-buffer leads to some confusing output when tests run in parallel. Could you provide some reasoning for the current implementation here as well? The first thing that came to my mind was thread-binding conveyance again.

Jun 28 '22 17:06 buhhh

It seems like the use of with-redefs in kaocha.plugin.capture-output/with-capture does not fit tests running in parallel... Also, (run! f @active-buffers) in kaocha.plugin.capture-output/doto-capture-buffer leads to some confusing output when tests run in parallel.

This plugin doesn't really work well with parallelization because it was written before we introduced this feature and hasn't been updated yet. I don't think the design of kaocha.plugin.capture-output had threading in mind.

Jun 29 '22 03:06 alysbrooks

This plugin doesn't really work well with parallelization because it was written before we introduced this feature and hasn't been updated yet. I don't think the design of kaocha.plugin.capture-output had threading in mind.

And yet the plugin is loaded by default, right? I do think the output is much worse without it though. Could Arne weigh in maybe, @plexus?

Jul 05 '22 06:07 buhhh

As you can see this us still an open PR, we appreciate people testing it out and providing feedback. Most of the Kaocha codebase predates parallelization by several years, and so there will likely be more areas where we discover things don't yet play well together.

Suggestions on how to rework the plugin to work better with this branch, or perhaps a PR, are very welcome.

Could Arne weigh in maybe, @plexus?

Alys knows what she is doing here, no need to tag people in like this. All issues are monitored and discussed within our team.

Jul 05 '22 10:07 plexus

So what is the reasoning for the current implementation, (i) the use of with-redefs and the (ii) active-buffers construct?

Jul 05 '22 13:07 buhhh

@buhhh you can see from the commit history that the capture-output plugin is based on eftest's implementation of output capturing. Both active-buffers and the with-redefs are inherited from there.

Jul 05 '22 13:07 nwjsmith

@nwjsmith Thanks for the hint. Now as far as I have understood Eftest is multi-threaded by default suggesting the implementation is in fact intended for this situation. I have found some reasoning here github.com/weavejester/eftest/commit/c920b5223e3c458b1443404974fe9ca3b76b6381. This seems to confirm that the main motivation are threads without thread-binding conveyance, but I might miss something. I would actually not be sure if this should be accounted for in output capturing at all. So, in my eyes, one could maybe do without (i) the with-redefs and (ii) active-buffers – possibly collecting the output somewhere else instead of sending it to all 'active' capture buffers.

Jul 05 '22 14:07 buhhh

kaocha kaocha copied to clipboard

Run tests in parallel

You can now use the :parallel-children-exclude key to ensure that any children will not be further parallelized.

Regal's test suite in parallel passes

kaocha
kaocha copied to clipboard

You can now use the `:parallel-children-exclude` key to ensure that any children will not be further parallelized.