SciMLBase.jl icon indicating copy to clipboard operation
SciMLBase.jl copied to clipboard

bizarre heisenbug performance problem when solving `EnsembleProblem`

Open ExpandingMan opened this issue 3 years ago • 4 comments

I'm getting some extremely bizarre performance characteristics when solving an EnsembleProblem with EnsembleThreads. In my particular case, I had already implemented a parallel solve with parallel threads using ThreadsX that solved in about 25 s for the examples I was trying. Expecting EnsembleProblem to be about the same or slightly faster, I was surprised to find it not only take a really long time but actually run out of memory (on 16 GB of RAM) before it ever got there.

I've done a ton of experimenting trying to isolate this but it has been extremely difficult. I have tried reproducing just about all aspects of my problem to create a MWE, but my attempted MWE was never slow, even at the point where I was solving the same equations (you can see this attempt here, again this was NOT slow). So, I went the other way, and started trying to simplify the context of my original problem.

I believe I now have simplified it about as much as possible, see my repo here. Running my solve method is incredibly slow and inconsistent, anywhere from 10 to 20 s for a mere 100 trajectories. At this point, I am only solving a simple 4-dimensional harmonic oscillator and it is almost as slow as solving for the geodesics on a wormhole manifold.

The only thing I can conclude at this point is that whatever is happening is only happening when I run this inside my package, but not if I just include a file with it.

Something else rather bizarre is that if I attempt to profile this using @profile the problem is partially alleviated (though I think still a bit slow):

julia> @time solve(C);
 21.132324 seconds (4.65 k allocations: 756.469 KiB)

julia> @profile @time solve(C);
  0.017434 seconds (4.66 k allocations: 756.516 KiB)

Because of this I have gotten exactly nowhere trying to track this down via profiling.

This is occurring on both Julia 1.6.2 and 1.7-beta2.

ExpandingMan avatar Jul 08 '21 02:07 ExpandingMan

How do I run the example?

chriselrod avatar Aug 05 '21 21:08 chriselrod

Just do include("render.jl") from the test directory and then an example of what has the issue is solve(cam()). This will call the solve method in src/ensemble.jl which uses EnsembleThreads() by default.

ExpandingMan avatar Aug 05 '21 21:08 ExpandingMan

Watching htop during a multithreaded fit is interesting. When multithreaded, my CPU sits at 0%. Every now and then I see brief spikes of activity.

My load average eventually dropped to 0.00 during the solve.

EDIT: Like you reported, @profile makes the solve time go from 433 seconds to 1.66 seconds. My solves are apparently much slower than yours. Single threaded, it takes about 18 seconds.

EDIT: Replacing Threads.@threads with Polyester.@batch in SciMLBase.tmap solves the problem. ThreadsX.map does not, so the problem is there with both Threads.@threads and Threads.@spawn or with @sync.

chriselrod avatar Aug 06 '21 03:08 chriselrod

Yes, I had also noticed that none of my threads were at maxed out utilization, I think I forgot to report that.

ExpandingMan avatar Aug 06 '21 15:08 ExpandingMan