hpx icon indicating copy to clipboard operation
hpx copied to clipboard

v1.10.0: tests fail on ppc64le & s390x

Open junghans opened this issue 4 months ago • 6 comments

In an attempt to fix https://bugzilla.redhat.com/show_bug.cgi?id=2381020:

diff --git a/hpx.spec b/hpx.spec
index 94548c3..cadca25 100644
--- a/hpx.spec
+++ b/hpx.spec
@@ -215,7 +216,7 @@ rm %{buildroot}/%{_datadir}/%{name}/LICENSE_1_0.txt
 . /etc/profile.d/modules.sh
 for mpi in '' openmpi mpich ; do
   test -n "${mpi}" && module load mpi/${mpi}-%{_arch}
-  make -C %{__cmake_builddir}/ tests.examples
+  %ctest --tests-regex tests.examples
   test -n "${mpi}" && module unload mpi/${mpi}-%{_arch}
 done

I realized we actually never ran the tests during the rpm build after enabling the test we get: Pass: buildArch (hpx-1.10.0-7.fc44.src.rpm, x86_64) Pass: buildArch (hpx-1.10.0-7.fc44.src.rpm, aarch64) Fail: buildArch (hpx-1.10.0-7.fc44.src.rpm, ppc64le) Fail: buildArch (hpx-1.10.0-7.fc44.src.rpm, s390x) (see https://koji.fedoraproject.org/koji/taskinfo?taskID=135996561)

In the ppc64le build log it has:

98% tests passed, 2 tests failed out of 120
Total Test time (real) =  63.65 sec
The following tests FAILED:
        1088 - tests.examples.1d_stencil.1d_stencil_4_parallel (Failed)
        1091 - tests.examples.1d_stencil.1d_stencil_7 (Failed)
Errors while running CTest

with

pure virtual method called
terminate called without an active exception
Base command is "/builddir/build/BUILD/hpx-1.10.0-build/hpx-1.10.0/ppc64le-redhat-linux-gnu-serial/bin/1d_stencil_7 --hpx:threads=4"
Executing command: /builddir/build/BUILD/hpx-1.10.0-build/hpx-1.10.0/ppc64le-redhat-linux-gnu-serial/bin/1d_stencil_7 --hpx:threads=4
Process 0 failed with an unexpected error code of 255 (expected 0) <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>

and

35/120 Test #1088: tests.examples.1d_stencil.1d_stencil_4_parallel .............................................***Failed    0.12 sec
...
{what}: Segmentation fault
Base command is "/builddir/build/BUILD/hpx-1.10.0-build/hpx-1.10.0/ppc64le-redhat-linux-gnu-serial/bin/1d_stencil_4_parallel --hpx:threads=4"
Executing command: /builddir/build/BUILD/hpx-1.10.0-build/hpx-1.10.0/ppc64le-redhat-linux-gnu-serial/bin/1d_stencil_4_parallel --hpx:threads=4
Process 0 failed with an unexpected error code of 255 (expected 0) <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>

Build log: build_ppc64le.txt.zip

s390x is much worse:

11% tests passed, 107 tests failed out of 120
Total Test time (real) = 4511.38 sec

Build log: build_s390x.txt.zip

junghans avatar Aug 13 '25 13:08 junghans

@junghans thanks for this report.

For the PPC64le tests, I think we can ignore the issues reported from those two tests (they are known to be unstable).

For the s390x tests, I don't know what's causing those, also I don't have access to such a machine to identify the issues. I'd say we disable the rpm build for this architecture as I don't see how we can support it reliably (except if somebody volunteers to do this).

hkaiser avatar Aug 13 '25 23:08 hkaiser

I found one more:

 81/118 Test #1136: tests.examples.quickstart.allow_unknown_options .............................................***Failed    0.12 sec
arg(0): /builddir/build/BUILD/hpx-1.10.0-build/hpx-1.10.0/ppc64le-redhat-linux-gnu-serial/bin/allow_unknown_options
{config}:
Core library:
  HPX_AGAS_LOCAL_CACHE_SIZE=4096
  HPX_HAVE_MALLOC=tcmalloc
  HPX_PARCEL_MAX_CONNECTIONS=512
  HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY=4
  HPX_PREFIX (configured)=
  HPX_PREFIX=
  HPX_FILESYSTEM_WITH_BOOST_FILESYSTEM_COMPATIBILITY=OFF
  HPX_ITERATOR_SUPPORT_WITH_BOOST_ITERATOR_TRAVERSAL_TAG_COMPATIBILITY=OFF
  HPX_WITH_AGAS_DUMP_REFCNT_ENTRIES=OFF
  HPX_WITH_APEX=OFF
  HPX_WITH_ASYNC_MPI=OFF
  HPX_WITH_ATTACH_DEBUGGER_ON_TEST_FAILURE=OFF
  HPX_WITH_AUTOMATIC_SERIALIZATION_REGISTRATION=ON
  HPX_WITH_COROUTINE_COUNTERS=OFF
  HPX_WITH_DISTRIBUTED_RUNTIME=ON
  HPX_WITH_DYNAMIC_HPX_MAIN=ON
  HPX_WITH_IO_COUNTERS=ON
  HPX_WITH_IO_POOL=ON
  HPX_WITH_ITTNOTIFY=OFF
  HPX_WITH_LOGGING=ON
  HPX_WITH_NETWORKING=ON
  HPX_WITH_PAPI=OFF
  HPX_WITH_PARALLEL_TESTS_BIND_NONE=OFF
  HPX_WITH_PARCELPORT_ACTION_COUNTERS=OFF
  HPX_WITH_PARCELPORT_COUNTERS=OFF
  HPX_WITH_PARCELPORT_GASNET=OFF
  HPX_WITH_PARCELPORT_LCI=OFF
  HPX_WITH_PARCELPORT_LIBFABRIC=OFF
  HPX_WITH_PARCELPORT_MPI=OFF
  HPX_WITH_PARCELPORT_TCP=ON
  HPX_WITH_PARCEL_COALESCING=ON
  HPX_WITH_PARCEL_PROFILING=OFF
  HPX_WITH_SANITIZERS=OFF
  HPX_WITH_SCHEDULER_LOCAL_STORAGE=OFF
  HPX_WITH_SPINLOCK_DEADLOCK_DETECTION=OFF
  HPX_WITH_STACKTRACES=ON
  HPX_WITH_STACKTRACES_DEMANGLE_SYMBOLS=ON
  HPX_WITH_STACKTRACES_STATIC_SYMBOLS=OFF
  HPX_WITH_TESTS_DEBUG_LOG=OFF
  HPX_WITH_THREAD_BACKTRACE_ON_SUSPENSION=OFF
  HPX_WITH_THREAD_CREATION_AND_CLEANUP_RATES=OFF
  HPX_WITH_THREAD_CUMULATIVE_COUNTS=ON
  HPX_WITH_THREAD_DEBUG_INFO=OFF
  HPX_WITH_THREAD_DESCRIPTION_FULL=OFF
  HPX_WITH_THREAD_GUARD_PAGE=ON
  HPX_WITH_THREAD_IDLE_RATES=OFF
  HPX_WITH_THREAD_LOCAL_STORAGE=OFF
  HPX_WITH_THREAD_MANAGER_IDLE_BACKOFF=ON
  HPX_WITH_THREAD_QUEUE_WAITTIME=OFF
  HPX_WITH_THREAD_STACK_MMAP=ON
  HPX_WITH_THREAD_STEALING_COUNTS=OFF
  HPX_WITH_THREAD_TARGET_ADDRESS=OFF
  HPX_WITH_TIMER_POOL=ON
  HPX_WITH_TUPLE_RVALUE_SWAP=ON
  HPX_WITH_VALGRIND=OFF
  HPX_WITH_VERIFY_LOCKS=OFF
  HPX_WITH_VERIFY_LOCKS_BACKTRACE=OFF
  HPX_WITH_WORK_REQUESTING_SCHEDULERS=ON
Module allocator_support:
  HPX_ALLOCATOR_SUPPORT_WITH_CACHING=ON
Module command_line_handling_local:
  HPX_COMMAND_LINE_HANDLING_LOCAL_WITH_JSON_CONFIGURATION_FILES=OFF
Module coroutines:
  HPX_COROUTINES_WITH_SWAP_CONTEXT_EMULATION=OFF
  HPX_COROUTINES_WITH_THREAD_SCHEDULE_HINT_RUNS_AS_CHILD=OFF
Module datastructures:
  HPX_DATASTRUCTURES_WITH_ADAPT_STD_TUPLE=ON
  HPX_DATASTRUCTURES_WITH_ADAPT_STD_VARIANT=OFF
Module logging:
  HPX_LOGGING_WITH_SEPARATE_DESTINATIONS=ON
Module serialization:
  HPX_SERIALIZATION_WITH_ALLOW_CONST_TUPLE_MEMBERS=OFF
  HPX_SERIALIZATION_WITH_ALLOW_RAW_POINTER_SERIALIZATION=OFF
  HPX_SERIALIZATION_WITH_ALL_TYPES_ARE_BITWISE_SERIALIZABLE=OFF
  HPX_SERIALIZATION_WITH_BOOST_TYPES=OFF
  HPX_SERIALIZATION_WITH_SUPPORTS_ENDIANESS=OFF
Module topology:
  HPX_TOPOLOGY_WITH_ADDITIONAL_HWLOC_TESTING=OFF
{version}: V1.10.0 (AGAS: V3.0), Git: unknown
{boost}: V1.83.0
{build-type}: release
{date}: Aug 14 2025 00:00:00
{platform}: linux
{compiler}: GNU C++ version 15.2.1 20250808 (Red Hat 15.2.1-1)
{stdlib}: GNU libstdc++ version 20250808
{stack-trace}: 10 frames:
0x7fffbbe30484  : __kernel_sigtramp_rt64 [0x0] in linux-vdso64.so.1
0x7fffbb568f14  : hpx::agas::server::primary_namespace::resolve_free_list(std::unique_lock<hpx::detail::spinlock<true> >&, std::__cxx11::list<std::_Rb_tree_iterator<std::pair<hpx::naming::gid_type const, long> >, std::allocator<std::_Rb_tree_iterator<std::pair<hpx::naming::gid_type const, long> > > > const&, std::__cxx11::list<hpx::agas::server::primary_namespace::free_entry, std::allocator<hpx::agas::server::primary_namespace::free_entry> >&, hpx::naming::gid_type const&, hpx::naming::gid_type const&, hpx::error_code&) [0x94] in /builddir/build/BUILD/hpx-1.10.0-build/hpx-1.10.0/ppc64le-redhat-linux-gnu-serial/lib/libhpx.so.1
0x7fffbb56cf78  : hpx::agas::server::primary_namespace::decrement_sweep(std::__cxx11::list<hpx::agas::server::primary_namespace::free_entry, std::allocator<hpx::agas::server::primary_namespace::free_entry> >&, hpx::naming::gid_type const&, hpx::naming::gid_type const&, long, hpx::error_code&) [0x2c8] in /builddir/build/BUILD/hpx-1.10.0-build/hpx-1.10.0/ppc64le-redhat-linux-gnu-serial/lib/libhpx.so.1
0x7fffbb56dd68  : hpx::agas::server::primary_namespace::decrement_credit(std::vector<hpx::tuple<long, hpx::naming::gid_type, hpx::naming::gid_type>, std::allocator<hpx::tuple<long, hpx::naming::gid_type, hpx::naming::gid_type> > > const&) [0x1d8] in /builddir/build/BUILD/hpx-1.10.0-build/hpx-1.10.0/ppc64le-redhat-linux-gnu-serial/lib/libhpx.so.1
0x7fffbb3d6eb8  : /builddir/build/BUILD/hpx-1.10.0-build/hpx-1.10.0/ppc64le-redhat-linux-gnu-serial/lib/libhpx.so.1(+0x1d6eb8) [0x7fffbb3d6eb8] in /builddir/build/BUILD/hpx-1.10.0-build/hpx-1.10.0/ppc64le-redhat-linux-gnu-serial/lib/libhpx.so.1
0x7fffbb3d73cc  : /builddir/build/BUILD/hpx-1.10.0-build/hpx-1.10.0/ppc64le-redhat-linux-gnu-serial/lib/libhpx.so.1(+0x1d73cc) [0x7fffbb3d73cc] in /builddir/build/BUILD/hpx-1.10.0-build/hpx-1.10.0/ppc64le-redhat-linux-gnu-serial/lib/libhpx.so.1
0x7fffbaf122ec  : hpx::threads::coroutines::detail::coroutine_impl::operator()() [0x12c] in /builddir/build/BUILD/hpx-1.10.0-build/hpx-1.10.0/ppc64le-redhat-linux-gnu-serial/lib/libhpx_core.so
0x7fffbaf117d8  : /builddir/build/BUILD/hpx-1.10.0-build/hpx-1.10.0/ppc64le-redhat-linux-gnu-serial/lib/libhpx_core.so(+0x1117d8) [0x7fffbaf117d8] in /builddir/build/BUILD/hpx-1.10.0-build/hpx-1.10.0/ppc64le-redhat-linux-gnu-serial/lib/libhpx_core.so
0x7fffba44e630  : makecontext [0xd8] in /lib64/glibc-hwcaps/power10/libc.so.6
{what}: Segmentation fault
Base command is "/builddir/build/BUILD/hpx-1.10.0-build/hpx-1.10.0/ppc64le-redhat-linux-gnu-serial/bin/allow_unknown_options --hpx:threads=4"
Executing command: /builddir/build/BUILD/hpx-1.10.0-build/hpx-1.10.0/ppc64le-redhat-linux-gnu-serial/bin/allow_unknown_options --hpx:threads=4
Process 0 failed with an unexpected error code of 255 (expected 0) <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>

junghans avatar Aug 14 '25 19:08 junghans

And another one:

The following tests FAILED:
        1245 - tests.examples.async_io.async_io_external (Failed)
Errors while running CTest

junghans avatar Aug 15 '25 13:08 junghans

@junghans thanks for the reports. We are aware that some (single) tests fail occasionally. In the past this kind of problems mostly were caused by these tests being flawed themselves.

hkaiser avatar Aug 15 '25 13:08 hkaiser

Every time I excluded a test on ppc64le another test failed, so I drop that build as well.

junghans avatar Aug 17 '25 00:08 junghans

Every time I excluded a test on ppc64le another test failed, so I drop that build as well.

Same here, we don't have access to such a platform, so there is no real way to support it reliably.

hkaiser avatar Aug 17 '25 13:08 hkaiser