Parallel test threads sometimes hang and prevent JVM from exiting
Hi there,
I have some code that uses parallel-test for integration testing with a database and web server, where tests are siloed from each other using different DBs and different ports.
Sometimes the test runner will hang. Some threads are being left around by the runner. I haven't found what triggers this. It seems that all of the test cases have concluded, but the runner doesn't realize.
I attached a stack dump from jstack below. I have no idea if this is a bug in parallel-test or in my code, it's almost certainly in my code, directly or indirectly, but unfortunately I don't have a minimal test case yet. I'll try to produce one in the coming days, but it may not be possible.
https://gist.github.com/amoe/d5a0aa56985994333432f8f3e4a4c5da
I don't know why there are so many async-dispatch-* threads, when only 4 maximum are used by my configuration for :parallel. Maybe that's something to do with it, but my other tests often allow the JVM to terminate even when having thread numbers like async-dispatch-13.
I just saw the pull request #2, but I'm not savvy enough to tell if that applies to this problem or not.
I've now reproduced the problem in my tiny demo project that doesn't use any other stuff and just tests (is (= 2 2)) in parallel. If I repeatedly rerun the tests (while true; do lein parallel-test; done) then the JVM will hang after a few minutes. This is OpenJDK 1.8.0_102, Clojure 1.8, parallel-test 0.3.0 from Clojars. Tomorrow I'll see if the changes in the above pull request can help.
I'm pretty sure this was caused by a new version of core.async which no longer had a special privileged thread for doing async scheduling. This was a good change for the core.async team to make, but meant that an abuse of async semantics I was taking advantage of would no longer work. I have added some of @pschorf's work from #2 to master in 068dcf7853166875154ea07a1be5534f9ff61b3e, along with changes which should resolve the threading problems with core.async.
If you could pull my master and try testing it locally to see if it's resolved, I'll go ahead and cut a point release so everyone can enjoy the bugfix (and the updated core.async default)
I was still able to reproduce the hang with the new master version. It hangs fairly quickly, maybe once in every 5/10 test runs. (The install procedure that I did was to delete ~/.m2/repository/com/holychao and run lein install from a newly cloned repo of parallel-test. Hopefully that is valid, I have never tested a leiningen plugin from source before.)
@amoe Your install procedure is valid, thanks for taking the time and inconvenience to test things out.
Were your hangs produced in the demo project or in the project where you originally intended to use parallel test? Is there a reproducible hanging codebase you can share with me? I was getting some reproducible hangs before the changes to master and it seemed to have squashed the problems for my test case, so I probably need a better test case.
Here's the demo project that I'm using.
I noticed that I left the target directory present, so you may want to delete that after extracting.
Any updates on the new release?