octotiger
octotiger copied to clipboard
HPX performance counters not found
Expected Behavior
Octo-Tiger completes.
Actual Behavior
Octo-Tiger (or HPX) occasionally complains some performance counters are not found.
Steps to Reproduce the Problem
Run Octo-Tiger with the following counters enabled:
--hpx:print-counter=/octotiger*/compute/gpu*kokkos*
--hpx:print-counter=/arithmetics/add@/octotiger*/compute/gpu/hydro_kokkos
--hpx:print-counter=/arithmetics/add@/octotiger*/compute/gpu/hydro_kokkos_aggregated
Specifications
Since there counters are created by Octo-Tiger, I think it is an Octo-Tiger problem rather than an HPX problem.
I suspected there were some data races between the counter registration and usage.
{stack-trace}: 11 frames:
0x7ffbe76bc29a : /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1(+0x4b629a) [0x7ffbe76bc29a] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1
0x7ffbe6e15d65 : std::__exception_ptr::exception_ptr hpx::detail::get_exception<hpx::exception>(hpx::exception const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) [0x95] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx_core.so
0x7ffbe6e15e35 : void hpx::detail::throw_exception<hpx::exception>(hpx::exception const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long) [0x55] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx_core.so
0x7ffbe6e0b854 : hpx::detail::throw_exception(hpx::error, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long) [0x84] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx_core.so
0x7ffbe77be0f8 : hpx::performance_counters::detail::create_counter_local(hpx::performance_counters::counter_info const&) [0x3f8] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1
0x7ffbe77f80fd : hpx::components::server::runtime_support::create_performance_counter(hpx::performance_counters::counter_info const&) [0xd] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1
0x7ffbe785c8fc : /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1(+0x6568fc) [0x7ffbe785c8fc] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1
0x7ffbe78113bd : /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1(+0x60b3bd) [0x7ffbe78113bd] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1
0x7ffbe6e03866 : hpx::threads::coroutines::detail::coroutine_impl::operator()() [0xd6] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx_core.so
0x7ffbe6e02a29 : /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx_core.so(+0x113a29) [0x7ffbe6e02a29] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx_core.so
{locality-id}: 2
{hostname}: [ (mpi:2) ]
{process-id}: 68905
{os-thread}: 2, locality#0/worker-thread#16
{thread-id}: 0000000008897540
{thread-description}: <unknown>
{state}: state::startup
{auxinfo}:
{file}: /u/jiakuny/workspace/hpx-lcw/libs/full/performance_counters/src/counters.cpp
{line}: 808
{function}: create_counter_local
{what}: no create function for performance counter found: /octotiger{locality#2/total}/compute/gpu/multipole_kokkos (counter type /octotiger/compute/gpu/multipole_kokkos is not defined, known counter types:
/agas/count/allocate
/agas/count/begin_migration
/agas/count/bind
/agas/count/bind_gid
/agas/count/cache/entries
/agas/count/cache/erase_entry
/agas/count/cache/evictions
/agas/count/cache/get_entry
/agas/count/cache/hits
/agas/count/cache/insert_entry
/agas/count/cache/insertions
/agas/count/cache/misses
/agas/count/cache/update_entry
/agas/count/decrement_credit
/agas/count/end_migration
/agas/count/increment_credit
/agas/count/iterate_names
/agas/count/on_symbol_namespace_event
/agas/count/resolve
/agas/count/resolve_gid
/agas/count/route
/agas/count/unbind
/agas/count/unbind_gid
/agas/primary/count
/agas/primary/time
/agas/symbol/count
/agas/symbol/time
/agas/time/allocate
/agas/time/begin_migration
/agas/time/bind
/agas/time/bind_gid
/agas/time/cache/erase_entry
/agas/time/cache/get_entry
/agas/time/cache/insert_entry
/agas/time/cache/update_entry
/agas/time/decrement_credit
/agas/time/end_migration
/agas/time/increment_credit
/agas/time/iterate_names
/agas/time/on_symbol_namespace_event
/agas/time/resolve
/agas/time/resolve_gid
/agas/time/route
/agas/time/unbind
/agas/time/unbind_gid
/arithmetics/add
/arithmetics/count
/arithmetics/divide
/arithmetics/max
/arithmetics/mean
/arithmetics/median
/arithmetics/min
/arithmetics/multiply
/arithmetics/subtract
/arithmetics/variance
/octotiger/amr_bounds
/octotiger/compute/cpu/hydro_kokkos
/octotiger/compute/cpu/hydro_kokkos_aggregated
/octotiger/compute/cpu/hydro_kokkos_aggregation_rate
/octotiger/compute/cpu/hydro_legacy
/octotiger/compute/cpu/p2p_kokkos
/octotiger/compute/gpu/hydro_cuda
/octotiger/compute/gpu/hydro_cuda_aggregated
/octotiger/compute/gpu/hydro_cuda_aggregation_rate
/octotiger/compute/gpu/hydro_kokkos
/octotiger/compute/gpu/hydro_kokkos_aggregated
/octotiger/compute/gpu/hydro_kokkos_aggregation_rate
/octotiger/compute/gpu/p2p_cuda
/octotiger/compute/gpu/p2p_kokkos
/octotiger/subgrid_leaves
/octotiger/subgrids
/parcelport/count/mpi/cache-evictions
/parcelport/count/mpi/cache-hits
/parcelport/count/mpi/cache-insertions
/parcelport/count/mpi/cache-misses
/parcelport/count/mpi/cache-reclaims
/parcelqueue/length/receive
/parcelqueue/length/send
/parcels/count/routed
/runtime/count/action-invocation
/runtime/count/component
/runtime/count/remote-action-invocation
/runtime/uptime
/scheduler/utilization/instantaneous
/statistics/average
/statistics/max
/statistics/median
/statistics/min
/statistics/rolling_average
/statistics/rolling_max
/statistics/rolling_min
/statistics/rolling_stddev
/statistics/stddev
/threadqueue/length
/threads/busy-loop-count/instantaneous
/threads/count/cumulative
/threads/count/cumulative-phases
/threads/count/instantaneous/active
/threads/count/instantaneous/all
/threads/count/instantaneous/pending
/threads/count/instantaneous/staged
/threads/count/instantaneous/suspended
/threads/count/instantaneous/terminated
/threads/idle-loop-count/instantaneous
/threads/time/overall
: HPX(bad_parameter)): HPX(bad_parameter):
This is on NCSA Delta. I think @diehlpk also encountered this problem on Ookami.