parallel-test icon indicating copy to clipboard operation
parallel-test copied to clipboard

Parallel test threads sometimes hang and prevent JVM from exiting

Open amoe opened this issue 9 years ago • 6 comments

Hi there,

I have some code that uses parallel-test for integration testing with a database and web server, where tests are siloed from each other using different DBs and different ports.

Sometimes the test runner will hang. Some threads are being left around by the runner. I haven't found what triggers this. It seems that all of the test cases have concluded, but the runner doesn't realize.

I attached a stack dump from jstack below. I have no idea if this is a bug in parallel-test or in my code, it's almost certainly in my code, directly or indirectly, but unfortunately I don't have a minimal test case yet. I'll try to produce one in the coming days, but it may not be possible.

https://gist.github.com/amoe/d5a0aa56985994333432f8f3e4a4c5da

I don't know why there are so many async-dispatch-* threads, when only 4 maximum are used by my configuration for :parallel. Maybe that's something to do with it, but my other tests often allow the JVM to terminate even when having thread numbers like async-dispatch-13.

I just saw the pull request #2, but I'm not savvy enough to tell if that applies to this problem or not.

amoe avatar Dec 29 '16 15:12 amoe

I've now reproduced the problem in my tiny demo project that doesn't use any other stuff and just tests (is (= 2 2)) in parallel. If I repeatedly rerun the tests (while true; do lein parallel-test; done) then the JVM will hang after a few minutes. This is OpenJDK 1.8.0_102, Clojure 1.8, parallel-test 0.3.0 from Clojars. Tomorrow I'll see if the changes in the above pull request can help.

amoe avatar Dec 29 '16 17:12 amoe

I'm pretty sure this was caused by a new version of core.async which no longer had a special privileged thread for doing async scheduling. This was a good change for the core.async team to make, but meant that an abuse of async semantics I was taking advantage of would no longer work. I have added some of @pschorf's work from #2 to master in 068dcf7853166875154ea07a1be5534f9ff61b3e, along with changes which should resolve the threading problems with core.async.

If you could pull my master and try testing it locally to see if it's resolved, I'll go ahead and cut a point release so everyone can enjoy the bugfix (and the updated core.async default)

aredington avatar Dec 30 '16 19:12 aredington

I was still able to reproduce the hang with the new master version. It hangs fairly quickly, maybe once in every 5/10 test runs. (The install procedure that I did was to delete ~/.m2/repository/com/holychao and run lein install from a newly cloned repo of parallel-test. Hopefully that is valid, I have never tested a leiningen plugin from source before.)

amoe avatar Dec 30 '16 22:12 amoe

@amoe Your install procedure is valid, thanks for taking the time and inconvenience to test things out.

Were your hangs produced in the demo project or in the project where you originally intended to use parallel test? Is there a reproducible hanging codebase you can share with me? I was getting some reproducible hangs before the changes to master and it seemed to have squashed the problems for my test case, so I probably need a better test case.

aredington avatar Dec 31 '16 13:12 aredington

Here's the demo project that I'm using.

parallel-tests.tar.gz

I noticed that I left the target directory present, so you may want to delete that after extracting.

amoe avatar Dec 31 '16 15:12 amoe

Any updates on the new release?

shamsimam avatar Mar 06 '17 20:03 shamsimam