dedalus
dedalus copied to clipboard
LBVP gets stuck at the build_solver stage
Hi I've encountered an issue when trying to solve an LBVP to find the vector potential from a magnetic field. I've attached a code which shows the issue. By setting the resolution to be small the code gets stuck at the build solver phase when the number of processors is increased to 4. I've also attached a debug log which shows that it stalls at this step on processor 0
2024-02-29 13:35:17,943 transforms 0/4 DEBUG :: Building FFTW FFT plan for (dtype, gshape, axis) = (<class 'numpy.float64'>, (3, 12, 2, 6), 1)
and similar for the other processors 1->3. This resolution is way too small for this problem, but a similar error occurs at higher resolutions on a cluster. Best, Calum
B_lbvp.txt dedalus_p0.log dedalus_p1.log dedalus_p2.log dedalus_p3.log
@csskene I've downloaded your script, and can run it on 1, 2 or 3 cores but not 4 (like you described). This looks a lot like a racing condition to me, where some cores are not participating in a global operation.
In particular, if you add these lines:
rank = MPI.COMM_WORLD.rank
print(f"rank {rank:d}, g:{B['g'].shape:}, c:{B['c'].shape}")
and run on 4 cores, you'll see:
rank 3, g:(3, 12, 0, 6), c:(3, 0, 5, 6)
rank 0, g:(3, 12, 2, 6), c:(3, 2, 5, 6)
rank 2, g:(3, 12, 2, 6), c:(3, 2, 5, 6)
rank 1, g:(3, 12, 2, 6), c:(3, 2, 5, 6)
so rank 3 is missing shape in the grid (in theta) and in the coeffs (in the m's).
Let me look into this a bit more and get back to you.