dolfinx
dolfinx copied to clipboard
interior_facet_assembly causes deadlock if there are no interior facets on the process
Summarize the issue
Issue reported at:
https://fenicsproject.discourse.group/t/scallar-assembly-of-a-internal-surface-integral-misbehaves-in-parallel/14655/3
The problem is that not all processes get to create_entity_permutations
https://github.com/FEniCS/dolfinx/blob/main/cpp/dolfinx/fem/assemble_scalar_impl.h#L190-L192
which calls mesh.topology.create_entities(d) for d=0,....,mesh.topology.dim.
In some cases, create_entities(d) needs parallel communication, causing a deadlock.
We need a code re-design to avoid these deadlock situations, as we cannot have create_entity_permutations inside loops that needs to be true on all processes.
To fix this for assembly, I would change how we compute integration_entities, i.e,
https://github.com/FEniCS/dolfinx/blob/b16ef8abc20d70c28d8f1850fe42a0ee0bcf5106/cpp/dolfinx/fem/utils.cpp#L131-L151
as we currently send in a full meshtags object, which doesn't have the subdomain ids from the form in question
https://github.com/FEniCS/dolfinx/blob/b16ef8abc20d70c28d8f1850fe42a0ee0bcf5106/python/dolfinx/fem/forms.py#L214-L215
I think this should take the meshtag information, and the id's we are interested in extracting (currently we extract way more than we need). This would yield integrals on all processes (with 0 facets on some).
How to reproduce the bug
Run following code with 1, 2 and 4 processes. Works on 1 and 2 processes, deadlocks on 4.
Minimal Example (Python)
from mpi4py import MPI
import numpy as np
import dolfinx
import ufl
if MPI.COMM_WORLD.rank == 0:
print(dolfinx.__version__, dolfinx.git_commit_hash)
domain = dolfinx.mesh.create_unit_cube(MPI.COMM_WORLD, 5, 6,6, ghost_mode=dolfinx.mesh.GhostMode.shared_facet)
def marker(x, eps=1e-14):
return np.logical_and(x[0]<=0.5+eps,np.isclose(x[1], 0.5))
domain.topology.create_connectivity(domain.topology.dim - 1, domain.topology.dim)
domain.topology.create_connectivity(domain.topology.dim, domain.topology.dim-1)
facet_map = domain.topology.index_map(domain.topology.dim - 1)
num_facets_on_proc = facet_map.size_local + facet_map.num_ghosts
facets = np.arange(num_facets_on_proc, dtype=np.int32)
facet_values = np.zeros(num_facets_on_proc, dtype=np.int32)
facet_values[dolfinx.mesh.locate_entities(domain, domain.topology.dim - 1, marker)] = 2
ft = dolfinx.mesh.meshtags(domain, domain.topology.dim - 1, facets, facet_values)
dS = ufl.Measure("dS", domain=domain, subdomain_data=ft, subdomain_id=2)
form = dolfinx.fem.form(1*dS)
print(dolfinx.fem.assemble_scalar(form))
Output (Python)
0.9.0.0 dbbbf9167f662f4bcc2c45da5a5998433ae24863
Version
main branch
DOLFINx git commit
dbbbf9167f662f4bcc2c45da5a5998433ae24863
Installation
No response
Additional information
No response
- Does this affect vector and matrix assembly too? (the assembler implementations have similar code).
- Does this affect C++ solvers?
- Does this affect vector and matrix assembly too? (the assembler implementations have similar code).
It doesn't seem like it, as we have an additional clause in https://github.com/FEniCS/dolfinx/blob/b16ef8abc20d70c28d8f1850fe42a0ee0bcf5106/cpp/dolfinx/fem/assemble_vector_impl.h#L1149-L1155 which will guard you, as this info is synced between all processes, thus the call at the later stage: https://github.com/FEniCS/dolfinx/blob/b16ef8abc20d70c28d8f1850fe42a0ee0bcf5106/cpp/dolfinx/fem/assemble_vector_impl.h#L1220-L1223 isn't really needed. The same happens in assemble_matrix https://github.com/FEniCS/dolfinx/blob/b16ef8abc20d70c28d8f1850fe42a0ee0bcf5106/cpp/dolfinx/fem/assemble_matrix_impl.h#L516-L523
* Does this affect C++ solvers?
In C++, we have a bit of a different approach, as the user is responsible of adding the map between subdomain id and integral entities: https://github.com/FEniCS/dolfinx/blob/b16ef8abc20d70c28d8f1850fe42a0ee0bcf5106/cpp/dolfinx/fem/utils.h#L329 So here we send in the integral ids with the entities, i.e. as long as the user doesn't have an MPI if clause sending in different maps on different processes, we should be all good.
However, I note that we do not have any demos illustrating how to do interior/exterior facet integrals on subdomains in C++, so there might be some use-cases of this that could cause a deadlock