SciMLBase.jl
SciMLBase.jl copied to clipboard
bizarre heisenbug performance problem when solving `EnsembleProblem`
I'm getting some extremely bizarre performance characteristics when solving an EnsembleProblem
with EnsembleThreads
. In my particular case, I had already implemented a parallel solve with parallel threads using ThreadsX
that solved in about 25 s for the examples I was trying. Expecting EnsembleProblem
to be about the same or slightly faster, I was surprised to find it not only take a really long time but actually run out of memory (on 16 GB of RAM) before it ever got there.
I've done a ton of experimenting trying to isolate this but it has been extremely difficult. I have tried reproducing just about all aspects of my problem to create a MWE, but my attempted MWE was never slow, even at the point where I was solving the same equations (you can see this attempt here, again this was NOT slow). So, I went the other way, and started trying to simplify the context of my original problem.
I believe I now have simplified it about as much as possible, see my repo here. Running my solve
method is incredibly slow and inconsistent, anywhere from 10 to 20 s for a mere 100 trajectories. At this point, I am only solving a simple 4-dimensional harmonic oscillator and it is almost as slow as solving for the geodesics on a wormhole manifold.
The only thing I can conclude at this point is that whatever is happening is only happening when I run this inside my package, but not if I just include
a file with it.
Something else rather bizarre is that if I attempt to profile this using @profile
the problem is partially alleviated (though I think still a bit slow):
julia> @time solve(C);
21.132324 seconds (4.65 k allocations: 756.469 KiB)
julia> @profile @time solve(C);
0.017434 seconds (4.66 k allocations: 756.516 KiB)
Because of this I have gotten exactly nowhere trying to track this down via profiling.
This is occurring on both Julia 1.6.2 and 1.7-beta2.
How do I run the example?
Just do include("render.jl")
from the test
directory and then an example of what has the issue is solve(cam())
. This will call the solve
method in src/ensemble.jl
which uses EnsembleThreads()
by default.
Watching htop
during a multithreaded fit is interesting.
When multithreaded, my CPU sits at 0%. Every now and then I see brief spikes of activity.
My load average eventually dropped to 0.00 during the solve
.
EDIT:
Like you reported, @profile
makes the solve
time go from 433 seconds to 1.66 seconds.
My solves are apparently much slower than yours. Single threaded, it takes about 18 seconds.
EDIT:
Replacing Threads.@threads
with Polyester.@batch
in SciMLBase.tmap
solves the problem.
ThreadsX.map
does not, so the problem is there with both Threads.@threads
and Threads.@spawn
or with @sync
.
Yes, I had also noticed that none of my threads were at maxed out utilization, I think I forgot to report that.