t8code icon indicating copy to clipboard operation
t8code copied to clipboard

Bug: committing a partitioned cmesh

Open jfussbro opened this issue 1 year ago • 6 comments

It was discovered, that committing a partitioned cmesh leads to an error in t8_cmesh_commit, where the first tree of the cmesh is -1. In this branch: https://github.com/DLR-AMR/t8code/tree/bug_committing_partitioned_cmesh the example t8_cmesh_create_partitioned was rebuild to form a minimal example of the bug.

jfussbro avatar May 06 '24 14:05 jfussbro

Maybe related to it on branch benchmark-New_for_hybrid running ./t8_time_hybrid_new with -r 1 -f first_example -d 3 -i 8 produces the following output on 128 process

-r 1 -f /scratch/ws/m0/knap_da-t8code_data/first_example -d 3 -i 8
------------------- USE 128 processes -------------------
[libsc] This is libsc 2.8.5.383-c467b
[p4est] This is p4est 2.8.5.367-931f
[t8] This is t8 2.0.0.171-b12d-dirty
[t8] CPP                      mpicc -E
[t8] CPPFLAGS
[t8] CC                       mpicc
[t8] CFLAGS                   -O0
[t8] LDFLAGS
[t8] LIBS                     -lz -lm  -lstdc++
[t8] Statistics for   cmesh_commit_sort
[t8]    Global number of values:     128
[t8]    Mean value (std. dev.):           0.124535 (0.014 = 11.2%)
[t8]    Minimum attained at rank      77: 0.0986603
[t8]    Maximum attained at rank       0: 0.15838
[t8] Statistics for   cmesh_commit_count
[t8]    Global number of values:     128
[t8]    Mean value (std. dev.):           1.63897e-05 (8.9e-06 = 54.3%)
[t8]    Minimum attained at rank     117: 8.37e-06
[t8]    Maximum attained at rank       2: 3.649e-05
[t8] Statistics for   cmesh_commit_end
[t8]    Global number of values:     128
[t8]    Mean value (std. dev.):           0.00752127 (0.0018 = 23.9%)
[t8]    Minimum attained at rank      54: 5.6279e-05
[t8]    Maximum attained at rank      97: 0.0147414
[t8] Summary = [ 0.124535 1.63897e-05 0.00752127 ];
[t8] Maximum = [ 0.15838 3.649e-05 0.0147414 ];
[t8] Enter cmesh partition
[libsc 109] Abort: Assertion '*last_local_tree == -1 || (0 <= *last_local_tree && *last_local_tree <= t8_cmesh_get_num_trees (cmesh))'

I can not upload the file, because msh-files are not supported. If you ask me, I will send it to you.

Davknapp avatar May 06 '24 14:05 Davknapp

From my understanding, t8_cmesh_set_partition_uniform is only meant to be used when deriving a cmesh from another cmesh. It will not effect the partition during creation from stash.

lukasdreyer avatar May 06 '24 15:05 lukasdreyer

Which is what I do in my case:

    t8_cmesh_t gmsh_cmesh = t8_cmesh_from_msh_file (file, true, sc_MPI_COMM_WORLD, dim, 0, false);
    t8_cmesh_init (&cmesh);
    t8_cmesh_set_derive (cmesh, gmsh_cmesh);
    t8_cmesh_set_partition_uniform (cmesh, init_level, t8_scheme_new_default_cxx ());
    t8_cmesh_commit (cmesh, sc_MPI_COMM_WORLD);

but we should also consider that I use it based on the hybrid-new, maybe the bug isn't related after all, just looks like it is.

Davknapp avatar May 07 '24 06:05 Davknapp

Found a solution for my bug: the submodules weren't updated correctly.

Davknapp avatar May 07 '24 09:05 Davknapp

From my understanding, t8_cmesh_set_partition_uniform is only meant to be used when deriving a cmesh from another cmesh. It will not effect the partition during creation from stash.

We should verify whether this is indeed the case. Judging from the functionality alone there is no reason why it should not be possible when not deriving - but it could be due to an implementation detail.

If it is not allowed to use for non-derived cmeshes then a) This issue is not an issue b) We must implement an error check that aborts when this was used together with non-derive c) We must document properly that this function cannot be used in this way.

holke avatar May 08 '24 14:05 holke

t8_cmesh_set_partition_uniform does not compute any offsets, rather it just memorizes that the cmesh should be partitioned according to a certain uniform refinement level and resets existing offsets.

  cmesh->set_partition = 1;
  cmesh->set_partition_level = element_level;
  cmesh->set_partition_scheme = ts;
  if (element_level >= 0) {
    /* We overwrite any previous partition settings */
    cmesh->first_tree = -1;
    cmesh->num_local_trees = -1;
    if (cmesh->tree_offsets != NULL) {
      t8_shmem_array_destroy (&cmesh->tree_offsets);
      cmesh->tree_offsets = NULL;
    }
  }

t8_cmesh_commit_partitioned_new expects either the offsets to be set, or first_tree and first_tree_shared to be set.

  if (cmesh->tree_offsets != NULL) {
    const t8_gloidx_t *tree_offsets = (t8_gloidx_t *) t8_shmem_array_get_gloidx_array (cmesh->tree_offsets);
    /* We partition using tree offsets */
    /* Get the first tree and whether it is shared */
    cmesh->first_tree = t8_offset_first (cmesh->mpirank, tree_offsets);
    cmesh->first_tree_shared = t8_shmem_array_get_gloidx (cmesh->tree_offsets, cmesh->mpirank) < 0;
    /* Get the number of local trees */
    cmesh->num_local_trees = t8_offset_num_trees (cmesh->mpirank, tree_offsets);
  }
  /* The first_tree and first_tree_shared entries must be set by now */
  T8_ASSERT (cmesh->first_tree >= 0);
  T8_ASSERT (cmesh->first_tree_shared >= 0);

It should be possible to enforce offsetcalculation here, if we called t8_cmesh_set_partition_uniform before.

lukasdreyer avatar May 13 '24 07:05 lukasdreyer

Solved by https://github.com/DLR-AMR/t8code/pull/1096

lukasdreyer avatar Jul 15 '24 09:07 lukasdreyer