meshmode icon indicating copy to clipboard operation
meshmode copied to clipboard

[Direct Connection] Group Contributions (probably) should not be summed

Open kaushikcfd opened this issue 3 years ago • 2 comments

I was looking at the generated expression for the direct connection expression and it is of the form:

        _pt_temp_1[idof_ensm0 + 4 * iel_ensm0 + 1024 * iface_ensm0] =
          (from_el_present[iel_ensm0 + 256 * iface_ensm0] ?
            normal_1_b_all[from_el_indices[iel_ensm0 + 256 * iface_ensm0]] * 0.5 * (_pt_part_ph_id_0[4 * from_el_indices[iel_ensm0 + 256 * iface_ensm0] + _pt_data_3[idof_ensm0 + 4 * iel_ensm0 + 1024 * iface_ensm0]] + -1.0 * _pt_part_ph_id_0[4 * from_el_indices[iel_ensm0 + 256 * iface_ensm0] + _pt_data_3[idof_ensm0 + 4 * iel_ensm0 + 1024 * iface_ensm0]])
            : 0.0)
          + (from_el_present_0[iel_ensm0 + 256 * iface_ensm0] ?
              _pt_part_ph_id_1[4 * from_el_indices_0[iel_ensm0 + 256 * iface_ensm0] + _pt_data_4[idof_ensm0 + 4 * iel_ensm0 + 1024 * iface_ensm0]] * normal_1_b_face_restr_interior[from_el_indices_0[iel_ensm0 + 256 * iface_ensm0]]
              : 0.0)
          + (from_el_present_1[iel_ensm0 + 256 * iface_ensm0] ?
              cse[4 * from_el_indices_2[iel_ensm0 + 256 * iface_ensm0] + _pt_data_6[idof_ensm0 + 4 * iel_ensm0 + 1024 * iface_ensm0]] * normal_1_b_BTAG_PARTITION[from_el_indices_2[iel_ensm0 + 256 * iface_ensm0]]
              : 0.0);

i.e. it is of the form (A if B else 0) + (C if D else 0) + (E if F else 0), but I think the optimized way of writing this would be A if B else (C if D else (E if F else 0)), notice how this could save us some conditional computation i.e. global memory reads.

kaushikcfd avatar Oct 13 '22 06:10 kaushikcfd

On some more thought I think the current way of summing the contributions is too global memory heavy, instead storing the mapping into a single array should be more efficient:

A[.., ...] if which_term[iel,idof]==0 else (B[..., ...] if which_term[iel,idof]==1 else 0)

This should significantly decrease the global memory footprint of the expression. (I think)

kaushikcfd avatar Oct 13 '22 06:10 kaushikcfd

I agree the sum is not lovely.

As long as none of the intermediates are materialized, the two things I can see wrong with it are

  • The from_el_present are likely avoidable
  • The from_el_indices_2 are bigger than they need to be

Is that your sense as well?

inducer avatar Oct 14 '22 20:10 inducer