dolfinx interior_facet_assembly causes deadlock if there are no interior facets on the process

Summarize the issue

Issue reported at: https://fenicsproject.discourse.group/t/scallar-assembly-of-a-internal-surface-integral-misbehaves-in-parallel/14655/3 The problem is that not all processes get to create_entity_permutations https://github.com/FEniCS/dolfinx/blob/main/cpp/dolfinx/fem/assemble_scalar_impl.h#L190-L192 which calls mesh.topology.create_entities(d) for d=0,....,mesh.topology.dim.

In some cases, create_entities(d) needs parallel communication, causing a deadlock.

We need a code re-design to avoid these deadlock situations, as we cannot have create_entity_permutations inside loops that needs to be true on all processes.

To fix this for assembly, I would change how we compute integration_entities, i.e, https://github.com/FEniCS/dolfinx/blob/b16ef8abc20d70c28d8f1850fe42a0ee0bcf5106/cpp/dolfinx/fem/utils.cpp#L131-L151 as we currently send in a full meshtags object, which doesn't have the subdomain ids from the form in question https://github.com/FEniCS/dolfinx/blob/b16ef8abc20d70c28d8f1850fe42a0ee0bcf5106/python/dolfinx/fem/forms.py#L214-L215 I think this should take the meshtag information, and the id's we are interested in extracting (currently we extract way more than we need). This would yield integrals on all processes (with 0 facets on some).

How to reproduce the bug

Run following code with 1, 2 and 4 processes. Works on 1 and 2 processes, deadlocks on 4.

Minimal Example (Python)

from mpi4py import MPI
import numpy as np
import dolfinx
import ufl
if MPI.COMM_WORLD.rank == 0:
    print(dolfinx.__version__, dolfinx.git_commit_hash)


domain = dolfinx.mesh.create_unit_cube(MPI.COMM_WORLD, 5, 6,6, ghost_mode=dolfinx.mesh.GhostMode.shared_facet)

def marker(x, eps=1e-14):
    return np.logical_and(x[0]<=0.5+eps,np.isclose(x[1], 0.5))

domain.topology.create_connectivity(domain.topology.dim - 1, domain.topology.dim)
domain.topology.create_connectivity(domain.topology.dim, domain.topology.dim-1)

facet_map = domain.topology.index_map(domain.topology.dim - 1)
num_facets_on_proc = facet_map.size_local + facet_map.num_ghosts
facets = np.arange(num_facets_on_proc, dtype=np.int32)
facet_values = np.zeros(num_facets_on_proc, dtype=np.int32)
facet_values[dolfinx.mesh.locate_entities(domain, domain.topology.dim - 1, marker)] = 2
ft = dolfinx.mesh.meshtags(domain, domain.topology.dim - 1, facets, facet_values)
dS = ufl.Measure("dS", domain=domain, subdomain_data=ft, subdomain_id=2)
form = dolfinx.fem.form(1*dS)


print(dolfinx.fem.assemble_scalar(form))

Output (Python)

0.9.0.0 dbbbf9167f662f4bcc2c45da5a5998433ae24863

Version

main branch

DOLFINx git commit

dbbbf9167f662f4bcc2c45da5a5998433ae24863

Installation

No response

Additional information

No response

May 15 '24 08:05 jorgensd

Does this affect vector and matrix assembly too? (the assembler implementations have similar code).
Does this affect C++ solvers?

May 16 '24 10:05 garth-wells

Does this affect vector and matrix assembly too? (the assembler implementations have similar code).

It doesn't seem like it, as we have an additional clause in https://github.com/FEniCS/dolfinx/blob/b16ef8abc20d70c28d8f1850fe42a0ee0bcf5106/cpp/dolfinx/fem/assemble_vector_impl.h#L1149-L1155 which will guard you, as this info is synced between all processes, thus the call at the later stage: https://github.com/FEniCS/dolfinx/blob/b16ef8abc20d70c28d8f1850fe42a0ee0bcf5106/cpp/dolfinx/fem/assemble_vector_impl.h#L1220-L1223 isn't really needed. The same happens in assemble_matrix https://github.com/FEniCS/dolfinx/blob/b16ef8abc20d70c28d8f1850fe42a0ee0bcf5106/cpp/dolfinx/fem/assemble_matrix_impl.h#L516-L523

* Does this affect C++ solvers?

In C++, we have a bit of a different approach, as the user is responsible of adding the map between subdomain id and integral entities: https://github.com/FEniCS/dolfinx/blob/b16ef8abc20d70c28d8f1850fe42a0ee0bcf5106/cpp/dolfinx/fem/utils.h#L329 So here we send in the integral ids with the entities, i.e. as long as the user doesn't have an MPI if clause sending in different maps on different processes, we should be all good.

However, I note that we do not have any demos illustrating how to do interior/exterior facet integrals on subdomains in C++, so there might be some use-cases of this that could cause a deadlock

May 16 '24 10:05 jorgensd

dolfinx dolfinx copied to clipboard

interior_facet_assembly causes deadlock if there are no interior facets on the process

Summarize the issue

How to reproduce the bug

Minimal Example (Python)

Output (Python)

Version

DOLFINx git commit

Installation

Additional information

dolfinx
dolfinx copied to clipboard