octotiger icon indicating copy to clipboard operation
octotiger copied to clipboard

HPX performance counters not found

Open JiakunYan opened this issue 3 months ago • 0 comments

Expected Behavior

Octo-Tiger completes.

Actual Behavior

Octo-Tiger (or HPX) occasionally complains some performance counters are not found.

Steps to Reproduce the Problem

Run Octo-Tiger with the following counters enabled:

--hpx:print-counter=/octotiger*/compute/gpu*kokkos*
--hpx:print-counter=/arithmetics/add@/octotiger*/compute/gpu/hydro_kokkos
--hpx:print-counter=/arithmetics/add@/octotiger*/compute/gpu/hydro_kokkos_aggregated

Specifications

Since there counters are created by Octo-Tiger, I think it is an Octo-Tiger problem rather than an HPX problem.

I suspected there were some data races between the counter registration and usage.

{stack-trace}: 11 frames:
0x7ffbe76bc29a  : /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1(+0x4b629a) [0x7ffbe76bc29a] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1
0x7ffbe6e15d65  : std::__exception_ptr::exception_ptr hpx::detail::get_exception<hpx::exception>(hpx::exception const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) [0x95] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx_core.so
0x7ffbe6e15e35  : void hpx::detail::throw_exception<hpx::exception>(hpx::exception const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long) [0x55] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx_core.so
0x7ffbe6e0b854  : hpx::detail::throw_exception(hpx::error, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long) [0x84] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx_core.so
0x7ffbe77be0f8  : hpx::performance_counters::detail::create_counter_local(hpx::performance_counters::counter_info const&) [0x3f8] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1
0x7ffbe77f80fd  : hpx::components::server::runtime_support::create_performance_counter(hpx::performance_counters::counter_info const&) [0xd] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1
0x7ffbe785c8fc  : /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1(+0x6568fc) [0x7ffbe785c8fc] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1
0x7ffbe78113bd  : /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1(+0x60b3bd) [0x7ffbe78113bd] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1
0x7ffbe6e03866  : hpx::threads::coroutines::detail::coroutine_impl::operator()() [0xd6] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx_core.so
0x7ffbe6e02a29  : /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx_core.so(+0x113a29) [0x7ffbe6e02a29] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx_core.so
{locality-id}: 2
{hostname}: [ (mpi:2) ]
{process-id}: 68905
{os-thread}: 2, locality#0/worker-thread#16
{thread-id}: 0000000008897540
{thread-description}: <unknown>
{state}: state::startup
{auxinfo}: 
{file}: /u/jiakuny/workspace/hpx-lcw/libs/full/performance_counters/src/counters.cpp
{line}: 808
{function}: create_counter_local
{what}: no create function for performance counter found: /octotiger{locality#2/total}/compute/gpu/multipole_kokkos (counter type /octotiger/compute/gpu/multipole_kokkos is not defined, known counter types: 
  /agas/count/allocate
  /agas/count/begin_migration
  /agas/count/bind
  /agas/count/bind_gid
  /agas/count/cache/entries
  /agas/count/cache/erase_entry
  /agas/count/cache/evictions
  /agas/count/cache/get_entry
  /agas/count/cache/hits
  /agas/count/cache/insert_entry
  /agas/count/cache/insertions
  /agas/count/cache/misses
  /agas/count/cache/update_entry
  /agas/count/decrement_credit
  /agas/count/end_migration
  /agas/count/increment_credit
  /agas/count/iterate_names
  /agas/count/on_symbol_namespace_event
  /agas/count/resolve
  /agas/count/resolve_gid
  /agas/count/route
  /agas/count/unbind
  /agas/count/unbind_gid
  /agas/primary/count
  /agas/primary/time
  /agas/symbol/count
  /agas/symbol/time
  /agas/time/allocate
  /agas/time/begin_migration
  /agas/time/bind
  /agas/time/bind_gid
  /agas/time/cache/erase_entry
  /agas/time/cache/get_entry
  /agas/time/cache/insert_entry
  /agas/time/cache/update_entry
  /agas/time/decrement_credit
  /agas/time/end_migration
  /agas/time/increment_credit
  /agas/time/iterate_names
  /agas/time/on_symbol_namespace_event
  /agas/time/resolve
  /agas/time/resolve_gid
  /agas/time/route
  /agas/time/unbind
  /agas/time/unbind_gid
  /arithmetics/add
  /arithmetics/count
  /arithmetics/divide
  /arithmetics/max
  /arithmetics/mean
  /arithmetics/median
  /arithmetics/min
  /arithmetics/multiply
  /arithmetics/subtract
  /arithmetics/variance
  /octotiger/amr_bounds
  /octotiger/compute/cpu/hydro_kokkos
  /octotiger/compute/cpu/hydro_kokkos_aggregated
  /octotiger/compute/cpu/hydro_kokkos_aggregation_rate
  /octotiger/compute/cpu/hydro_legacy
  /octotiger/compute/cpu/p2p_kokkos
  /octotiger/compute/gpu/hydro_cuda
  /octotiger/compute/gpu/hydro_cuda_aggregated
  /octotiger/compute/gpu/hydro_cuda_aggregation_rate
  /octotiger/compute/gpu/hydro_kokkos
  /octotiger/compute/gpu/hydro_kokkos_aggregated
  /octotiger/compute/gpu/hydro_kokkos_aggregation_rate
  /octotiger/compute/gpu/p2p_cuda
  /octotiger/compute/gpu/p2p_kokkos
  /octotiger/subgrid_leaves
  /octotiger/subgrids
  /parcelport/count/mpi/cache-evictions
  /parcelport/count/mpi/cache-hits
  /parcelport/count/mpi/cache-insertions
  /parcelport/count/mpi/cache-misses
  /parcelport/count/mpi/cache-reclaims
  /parcelqueue/length/receive
  /parcelqueue/length/send
  /parcels/count/routed
  /runtime/count/action-invocation
  /runtime/count/component
  /runtime/count/remote-action-invocation
  /runtime/uptime
  /scheduler/utilization/instantaneous
  /statistics/average
  /statistics/max
  /statistics/median
  /statistics/min
  /statistics/rolling_average
  /statistics/rolling_max
  /statistics/rolling_min
  /statistics/rolling_stddev
  /statistics/stddev
  /threadqueue/length
  /threads/busy-loop-count/instantaneous
  /threads/count/cumulative
  /threads/count/cumulative-phases
  /threads/count/instantaneous/active
  /threads/count/instantaneous/all
  /threads/count/instantaneous/pending
  /threads/count/instantaneous/staged
  /threads/count/instantaneous/suspended
  /threads/count/instantaneous/terminated
  /threads/idle-loop-count/instantaneous
  /threads/time/overall
: HPX(bad_parameter)): HPX(bad_parameter): 

This is on NCSA Delta. I think @diehlpk also encountered this problem on Ookami.

JiakunYan avatar Apr 06 '24 04:04 JiakunYan