Trixi.jl icon indicating copy to clipboard operation
Trixi.jl copied to clipboard

P4est MPI simulation gets stuck when not all ranks have BC faces

Open benegee opened this issue 3 months ago • 4 comments

I first encountered this issue for the baroclinic instability test case. Then I found a reduced example based on elixir_advection_basic but with nonperiodic boundary conditions:

  • 3 cells
  • 3 MPI ranks
  • periodic boundary conditions in x-direction
  • Dirichlet boundary conditions in y-direction
  • important is that the rank in the middle does not have any boundaries with Dirichlet boundary condition

Screenshot_20240319_120453

Running this with system MPI and on 3 ranks makes the simulation hang. Running with tmpi and looking at the backtraces when aborting shows that two ranks have called init_boundaries and eventually p4est_iterate, while the third (the middle one) is already somewhere in rhs!.

It seems to be caused by the check https://github.com/trixi-framework/Trixi.jl/blob/2dfde7faf3cc74f066d86148ae6c99ed9e58fa79/src/solvers/dgsem_p4est/containers.jl#L266-L268 Consequently only two ranks eventually call p4est_iterate. On the p4est side p4est_iterate first calls p4est_is_valid, which calls MPI_Allreduce.

This would explain the blocking. What I do not understand is why it works when not using system MPI.

MWE

using OrdinaryDiffEq
using Trixi

###############################################################################
# semidiscretization of the linear advection-diffusion equation

advection_velocity = (1.0, 0.0)
equations = LinearScalarAdvectionEquation2D(advection_velocity)

initial_condition = initial_condition_gauss

boundary_conditions = Dict(:y_neg => BoundaryConditionDirichlet(initial_condition),
                           :y_pos => BoundaryConditionDirichlet(initial_condition))

# Create DG solver with polynomial degree = 3 and (local) Lax-Friedrichs/Rusanov flux as surface flux
solver = DGSEM(polydeg = 3, surface_flux = flux_lax_friedrichs)

coordinates_min = (-5.0, -5.0)
coordinates_max = (5.0, 5.0)

trees_per_dimension = (1, 3)

mesh = P4estMesh(trees_per_dimension, polydeg = 3,
coordinates_min = coordinates_min, coordinates_max = coordinates_max, initial_refinement_level = 0,
periodicity = (true, false))

# A semidiscretization collects data structures and functions for the spatial discretization
semi = SemidiscretizationHyperbolic(mesh, equations,
                                    initial_condition, solver,
                                    boundary_conditions = boundary_conditions)

###############################################################################
# ODE solvers, callbacks etc.

# Create ODE problem with time span `tspan`
tspan = (0.0, 1.0)
ode = semidiscretize(semi, tspan);

# At the beginning of the main loop, the SummaryCallback prints a summary of the simulation setup
# and resets the timers
summary_callback = SummaryCallback()

callbacks = CallbackSet(summary_callback)

###############################################################################
# run the simulation

# OrdinaryDiffEq's `solve` method evolves the solution in time and executes the passed callbacks
sol = solve(ode, CarpenterKennedy2N54(williamson_condition = false),
            dt = 0.01,
            save_everystep = false, callback = callbacks);

# Print the timer summary
summary_callback()

benegee avatar Mar 19 '24 11:03 benegee