libmesh icon indicating copy to clipboard operation
libmesh copied to clipboard

Delete elements in distributed mesh

Open manavbhatia opened this issue 4 years ago • 7 comments

I have a distributed mesh and I am calling delete_elem() on it followed by prepare_for_use(), where this assert is tripped.

      libmesh_assert (
                      ((min_procid == this->processor_id()) && obj)
                      ||
                      (min_procid != this->processor_id())
                      );

I should note that the call to delete_elem() is inside a loop with local_elem_begin/local_elem_end iterators on each processor, so the elements get deleted locally, not uniformly on all processors.

Previously, I was using this approach with ReplicatedMesh and elem_begin/elem_end iterators so that all processors did the exact same operations. So, this is likely the issue.

Is there a correct approach to delete elements in a distributed mesh?

manavbhatia avatar Aug 13 '20 18:08 manavbhatia

I'm afraid that for any element you delete on one processor you're going to need to also delete it on any processor where it's ghosted, at least before you try to do any major mesh operations or prepare_for_use() on the updated mesh. If in your algorithm processors can't always figure out which of their ghosted elements need deleting, then you'll need to query their owners (via something like sync_dofobject_data_by_id() in parallel_ghost_sync.h) to find out.

roystgnr avatar Aug 13 '20 21:08 roystgnr

Thanks Roy! Would using semilocal_elem_begin/semilocal_elem_end instead of local_elem_begin/local_elem_end include ghosted elements on each processor?

manavbhatia avatar Aug 13 '20 21:08 manavbhatia

changing the iterator to semilocal_elements_begin gives the following errors:

libc++abi.dylib: Pure virtual function called!
[InfiHorizon:49775] *** Process received signal ***
libc++abi.dylib: Pure virtual function called!
[InfiHorizon:49774] Signal: Abort trap: 6 (6)
[InfiHorizon:49774] Signal code:  (0)
[InfiHorizon:49775] [ 0] [InfiHorizon:49772] [ 0] 0   libsystem_platform.dylib            0x00007fff6fd3c5fd _sigtramp + 29
[InfiHorizon:49772] [ 1] 0   ???                                 0x0000000000000400 0x0 + 1024
[InfiHorizon:49772] [ 2] 0   libsystem_c.dylib                   0x00007fff6fc12808 abort + 120
[InfiHorizon:49772] [ 3] 0   libc++abi.dylib                     0x00007fff6ce71458 abort_message + 231
[InfiHorizon:49772] [ 4] 0   libc++abi.dylib                     0x00007fff6ce70e92 __cxa_deleted_virtual + 0

manavbhatia avatar Aug 13 '20 22:08 manavbhatia

Looks like the issue was because I had written the loop like this:

MeshBase::element_iterator 
it     = mesh.semilocal_elements_begin(), 
end = mesh.semilocal_elements_end();

for ( ; it != end; it++)
   mesh.delete_elem(*it);

So, the iterator was being corrupted in the process of deletion of elements. This modification gets rid of the pure virtual function error:

MeshBase::element_iterator 
it     = mesh.semilocal_elements_begin(), 
end = mesh.semilocal_elements_end();

std::vector<Elem*> elems_to_delete;

for ( ; it != end; it++)
    elems_to_delete.push_back(*it);

for (int I=0; I<elems_to_delete.size(); I++)
   mesh.delete_elem(elems_to_delete[i]);

Although, I find that using elements_begin/elements_end on each rank is able to accomplish the task of deletion of all elements without tripping any errors in the prepare_for_use.

I would appreciate if you could verify that elements_begin would be appropriate for this with DistributedMesh?

manavbhatia avatar Aug 14 '20 01:08 manavbhatia

elements_begin would be the most appropriate range to use on a distributed mesh.

I'm afraid Elem::is_semilocal and Mesh::semilocal_elements_begin are atavistically named; IIRC they predate GhostingFunctor. They basically ask the question "is this element a point neighbor of a local element", which is what we do for geometric ghosting by default right now, but if you change your mesh's ghosting properties then the behavior of "is_semilocal" isn't changed to match. Whereas, because we delete elements from the local mesh representation when they're no longer ghosted, elements_begin inherently iterates over all and only all local and ghosted elements.

roystgnr avatar Aug 14 '20 13:08 roystgnr

So, the iterator was being corrupted in the process of deletion of elements.

It seems to me that we went to some effort to ensure that neither ReplicatedMesh nor DistributedMesh's delete_elem() routines would invalidate iterators. For example, in the DistributedMesh::delete_elem() implementation there's even a comment:

  // But not yet from the container; we might invalidate
  // an iterator that way!

The container entry is similarly not erase()'d in ReplicatedMesh::delete_elem().

So it would be good if we could reproduce this issue in a small test code... maybe there is some iterator invalidating we are somehow missing.

jwpeterson avatar Aug 14 '20 13:08 jwpeterson

maybe there is some iterator invalidating we are somehow missing.

I think I've figured this out: The semilocal_ iterators are finding point neighbors of elements, which they do by searching through elements' neighbor_ptr() links, but until you prepare_for_use() after element deletion, those neighbor_ptr() links can be dangling pointers, and when they get dereferenced in the semilocal predicate test you're naturally likely to get a segfault or garbage leading to a segfault.

roystgnr avatar Aug 14 '20 15:08 roystgnr