CI: add rpm build workflow
For #1241
@junghans Does not seem like it works unless I'm missing something.
2025-04-24T23:54:55.8680985Z 10/12 Test #10: ArborX_Test_SpecializedTraversals ........***Failed 0.01 sec
2025-04-24T23:54:55.8681171Z Kokkos::OpenMP::initialize WARNING: OMP_PROC_BIND environment variable not set
2025-04-24T23:54:55.8681411Z In general, for best performance with OpenMP 4.0 or better set OMP_PROC_BIND=spread and OMP_PLACES=threads
2025-04-24T23:54:55.8681531Z For best performance with OpenMP 3.1 set OMP_PROC_BIND=true
2025-04-24T23:54:55.8681613Z For unit testing set OMP_PROC_BIND=false
2025-04-24T23:54:55.8681673Z Running 10 test cases...
2025-04-24T23:54:55.8683253Z /builddir/build/BUILD/ArborX-2.0-build/ArborX-2.0/test/tstNeighborList.cpp(177): [1;31;49merror: in "find_neighbor_list_compare_filtered_tree_traversal<Kokkos__Device<Kokkos__OpenMP_ Kokkos__HostSpace>>": check Test::buildHalfNeighborListAndExpandToFull(exec_space, points, radius) == Test::compute_reference<MemorySpace>(exec_space, points, radius) has failed
2025-04-24T23:54:55.8683433Z - mismatch at position 0: [( 2 7 8 24 38 41 46 53 60 63 64 91 ) == ( 2 7 8 24 38 41 53 60 63 64 91 )] is false
2025-04-24T23:54:55.8683550Z - mismatch at position 3: [( 8 46 53 88 ) == ( 8 53 )] is false
2025-04-24T23:54:55.8683718Z - mismatch at position 6: [( 14 20 35 36 42 48 50 68 84 94 96 ) == ( 14 20 35 36 42 48 50 84 94 96 )] is false
2025-04-24T23:54:55.8683940Z - mismatch at position 8: [( 0 2 3 7 24 40 41 46 53 63 64 66 88 91 ) == ( 0 2 3 7 24 40 41 53 63 64 66 91 )] is false
2025-04-24T23:54:55.8684084Z - mismatch at position 14: [( 6 35 36 37 78 80 98 ) == ( 6 35 36 37 78 98 )] is false
2025-04-24T23:54:55.8684238Z - mismatch at position 17: [( 5 25 26 27 38 41 51 52 60 ) == ( 5 25 26 27 38 41 51 60 )] is false
2025-04-24T23:54:55.8684373Z - mismatch at position 22: [( 31 33 55 62 67 74 ) == ( 33 55 67 )] is false
2025-04-24T23:54:55.8684528Z - mismatch at position 25: [( 5 17 26 27 38 41 51 52 60 ) == ( 5 17 26 27 38 41 51 60 )] is false
2025-04-24T23:54:55.8684769Z - mismatch at position 26: [( 5 17 25 27 38 41 51 52 60 73 ) == ( 5 17 25 27 38 41 51 60 73 )] is false
2025-04-24T23:54:55.8684904Z - mismatch at position 27: [( 5 17 25 26 51 52 ) == ( 5 17 25 26 51 )] is false
2025-04-24T23:54:55.8685049Z - mismatch at position 31: [( 22 32 33 55 61 62 67 74 ) == ( 33 55 61 67 )] is false
2025-04-24T23:54:55.8685191Z - mismatch at position 32: [( 31 33 46 55 61 67 74 83 90 ) == ( 33 55 61 67 )] is false
2025-04-24T23:54:55.8685300Z - mismatch at position 34: [( 61 83 90 ) == ( 61 )] is false
2025-04-24T23:54:55.8685458Z - mismatch at position 36: [( 6 14 35 48 50 68 80 89 98 ) == ( 6 14 35 48 50 89 98 )] is false
2025-04-24T23:54:55.8685611Z - mismatch at position 40: [( 8 24 47 53 66 88 91 94 97 ) == ( 8 24 47 53 66 91 94 97 )] is false
2025-04-24T23:54:55.8685780Z - mismatch at position 41: [( 0 8 17 24 25 26 52 60 64 66 73 91 ) == ( 0 8 17 24 25 26 60 64 66 73 91 )] is false
2025-04-24T23:54:55.8686091Z - mismatch at position 42: [( 2 6 7 16 35 48 53 63 72 78 80 89 94 ) == ( 2 6 7 16 35 48 53 63 72 78 89 94 )] is false
2025-04-24T23:54:55.8686226Z - mismatch at position 46: [( 0 3 8 32 53 63 89 ) == ( 53 63 89 )] is false
2025-04-24T23:54:55.8686446Z - mismatch at position 48: [( 6 35 36 42 50 68 80 89 ) == ( 6 35 36 42 50 89 )] is false
2025-04-24T23:54:55.8686579Z - mismatch at position 50: [( 6 36 48 68 80 ) == ( 6 36 48 )] is false
2025-04-24T23:54:55.8686702Z - mismatch at position 52: [( 17 25 26 27 41 60 ) == ( 60 )] is false
2025-04-24T23:54:55.8686918Z - mismatch at position 53: [( 0 2 3 7 8 35 40 42 46 63 88 89 91 94 ) == ( 0 2 3 7 8 35 40 42 46 63 89 91 94 )] is false
2025-04-24T23:54:55.8687080Z - mismatch at position 55: [( 22 31 32 33 61 67 74 78 99 ) == ( 22 31 32 33 61 67 78 99 )] is false
2025-04-24T23:54:55.8687239Z - mismatch at position 61: [( 31 32 33 34 55 62 67 74 83 90 99 ) == ( 31 32 33 34 55 67 99 )] is false
2025-04-24T23:54:55.8687390Z - mismatch at position 62: [( 22 31 61 67 74 ) == ( 67 )] is false
2025-04-24T23:54:55.8687539Z - mismatch at position 64: [( 0 8 24 41 66 73 75 91 ) == ( 0 8 24 41 66 73 91 )] is false
2025-04-24T23:54:55.8687689Z - mismatch at position 66: [( 8 24 40 41 64 75 91 97 ) == ( 8 24 40 41 64 91 97 )] is false
2025-04-24T23:54:55.8687837Z - mismatch at position 67: [( 22 31 32 33 55 61 62 74 ) == ( 22 31 32 33 55 61 62 )] is false
2025-04-24T23:54:55.8687967Z - mismatch at position 68: [( 6 36 48 50 80 89 98 ) == ( 89 98 )] is false
2025-04-24T23:54:55.8688104Z - mismatch at position 74: [( 22 31 32 55 61 62 67 83 90 ) == ( )] is false
2025-04-24T23:54:55.8688198Z - mismatch at position 75: [( 64 66 ) == ( )] is false
2025-04-24T23:54:55.8688332Z - mismatch at position 80: [( 14 36 42 48 50 68 89 98 ) == ( 89 98 )] is false
2025-04-24T23:54:55.8688451Z - mismatch at position 83: [( 32 34 61 74 90 ) == ( )] is false
2025-04-24T23:54:55.8688568Z - mismatch at position 88: [( 3 8 40 53 91 ) == ( 91 )] is false
2025-04-24T23:54:55.8688832Z - mismatch at position 90: [( 32 34 61 74 83 ) == ( )] is false[0;39;49m
2025-04-24T23:54:55.8689034Z [1;31;49m*** 1 failure is detected in the test module "Master Test Suite"
@junghans Is it possible to run the CI based on the current branch? So that if I push here, it runs the change and not 2.0. Also, is there a way to speed it up, it seems to take 2+ hours?
@junghans Is it possible to run the CI based on the current branch? So that if I push here, it runs the change and not 2.0. Also, is there a way to speed it up, it seems to take 2+ hours?
@aprokop it makes a tarball out of the current checkout, it is just the tarball is always named ArborX-2.0.tar.gz.
I think we could make it faster by not building the mpi versions as it only happens in the serial build.
But maybe the easiest would to trying to just build it with same flags and CMake options:
2025-04-24T23:54:54.8665106Z + CFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer '
2025-04-24T23:54:54.8667973Z + export CFLAGS
2025-04-24T23:54:54.8669993Z + CXXFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer '
2025-04-24T23:54:54.8671711Z + export CXXFLAGS
2025-04-24T23:54:54.8683068Z + LDFLAGS='-Wl,-z,relro -Wl,--as-needed -Wl,-z,pack-relative-relocs -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-hardened-ld-errors -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes '
2025-04-24T23:54:54.8684300Z + export LDFLAGS
2025-04-24T23:54:54.8684489Z + LT_SYS_LIBRARY_PATH=/usr/lib64:
2025-04-24T23:54:54.8684706Z + export LT_SYS_LIBRARY_PATH
2025-04-24T23:54:54.8684880Z + CC=gcc
2025-04-24T23:54:54.8685017Z + export CC
2025-04-24T23:54:54.8685151Z + CXX=g++
2025-04-24T23:54:54.8685280Z + export CXX
2025-04-24T23:54:54.8688017Z + /usr/bin/cmake -S . -B aarch64-redhat-linux-gnu-serial -DCMAKE_C_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_CXX_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_Fortran_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DCMAKE_INSTALL_DO_STRIP:BOOL=OFF -DCMAKE_INSTALL_PREFIX:PATH=/usr -DCMAKE_INSTALL_FULL_SBINDIR:PATH=/usr/bin -DCMAKE_INSTALL_SBINDIR:PATH=bin -DINCLUDE_INSTALL_DIR:PATH=/usr/include -DLIB_INSTALL_DIR:PATH=/usr/lib64 -DSYSCONF_INSTALL_DIR:PATH=/etc -DSHARE_INSTALL_PREFIX:PATH=/usr/share -DLIB_SUFFIX=64 -DBUILD_SHARED_LIBS:BOOL=ON -DARBORX_ENABLE_TESTS=ON -DARBORX_ENABLE_EXAMPLES=OFF -DARBORX_ENABLE_BENCHMARKS=OFF -DARBORX_ENABLE_MPI=OFF -DCMAKE_INSTALL_DATADIR=/usr/share -DCMAKE_INSTALL_INCLUDEDIR=/usr/include
So it seems that the following change makes it pass:
--- a/src/spatial/detail/ArborX_ExpandHalfToFull.hpp
+++ b/src/spatial/detail/ArborX_ExpandHalfToFull.hpp
@@ -50,19 +50,13 @@ void expandHalfToFull(ExecutionSpace const &space, Offsets &offsets,
"ArborX::Experimental::HalfToFull::counts");
Kokkos::parallel_for(
"ArborX::Experimental::HalfToFull::rewrite",
- Kokkos::TeamPolicy(space, n, Kokkos::AUTO, 1),
- KOKKOS_LAMBDA(
- typename Kokkos::TeamPolicy<ExecutionSpace>::member_type const
- &member) {
- auto const i = member.league_rank();
- auto const first = offsets_orig(i);
- auto const last = offsets_orig(i + 1);
- Kokkos::parallel_for(
- Kokkos::TeamVectorRange(member, last - first), [&](int j) {
- int const k = indices_orig(first + j);
- indices(Kokkos::atomic_fetch_inc(&counts(i))) = k;
- indices(Kokkos::atomic_fetch_inc(&counts(k))) = i;
- });
+ Kokkos::RangePolicy(space, 0, n), KOKKOS_LAMBDA(int i) {
+ for (int j = offsets_orig(i); j < offsets_orig(i + 1); ++j)
+ {
+ int const k = indices_orig(j);
+ indices(Kokkos::atomic_fetch_inc(&counts(i))) = k;
+ indices(Kokkos::atomic_fetch_inc(&counts(k))) = i;
+ }
});
Kokkos::Profiling::popRegion();
}
I don't understand why. Both codes seem valid to me. It seems to only affect aarch64. Mac uses aarch64 but native Mac's toolchain does not support OpenMP, so I never ran it, and it passes in Serial.
I am not sure either, but maybe @dalg24 knows....
Either way, I patched that in and rebuild: https://koji.fedoraproject.org/koji/taskinfo?taskID=131982908
Hmm, the latest patch failed in a different place:
/builddir/build/BUILD/ArborX-2.0-build/ArborX-2.0/test/tstDBSCAN.cpp(185):
error: in "DBSCAN/dbscan<Kokkos__Device<Kokkos__OpenMP_ Kokkos__HostSpace>>":
check verifyDBSCAN( space, hidden_points, r - (Coordinate)0.1, 2, dbscan(space, hidden_points, r - (Coordinate)0.1, 2, params)) has failed
So, it seems, the failures are intermittent. I really need to be able to run things in a loop to properly debug this.
The CI2 is failing in 48min.
@aprokop let me know if you have a patch set I should test on Fedora again.
@aprokop any update on this, anything I can help with.
@junghans I think I'm essentially stuck here. None of it makes any sense to me. I will try to add more printouts and see if I can track it some more. I wonder if it is some optimizations again, and the issue would disappear with "-O0".
Either the failure is intermittent (which it could be), or it is similar to #1186.
@dalg24 any ideas?
@aprokop I gave this a fresh start, similar to what we did in Silo.
I looked at the code again, and still no wiser. Need to be able to access arm64 without cross-compiling.
On apple-silicon mac with docker you could use the following dockerfile:
FROM registry.fedoraproject.org/fedora:latest
RUN dnf install -y fedpkg wget
RUN wget https://github.com/junghans/ArborX/archive/refs/heads/rpmbuild.zip
RUN unzip rpmbuild.zip
RUN mv ArborX-rpmbuild ArborX
RUN tar -cvzf ArborX-9999.tar.gz ArborX/
RUN mkdir ArborX.rpm
RUN mv ArborX-9999.tar.gz ArborX.rpm
WORKDIR ArborX.rpm
RUN wget https://github.com/junghans/ArborX/raw/refs/heads/rpmbuild/.github/workflows/ArborX.spec
RUN dnf -y builddep ArborX.spec
RUN fedpkg --verbose --debug local
On apple-silicon mac with docker you could use the following dockerfile:
Thank you. I need to figure out how to bypass the proxies on my work mac, as it's running into
2.223 Failed to download metadata (metalink: "https://mirrors.fedoraproject.org/metalink?repo=fedora-43&arch=aarch64")
for repository "fedora": Cannot prepare internal mirrorlist:
Curl error (60): SSL peer certificate or SSH remote key was not OK for
https://mirrors.fedoraproject.org/metalink?repo=fedora-43&arch=aarch64
[SSL certificate problem: self-signed certificate in certificate chain]
You could add a line like
ENV https_proxy myproxy:8080
ENV https_proxy myproxy:8080
I think I wrongly attributed the problem to proxy. It is the certificate replacement issue, where our deep inspection thingie replaces certificates. I slightly changed the recipe by this tweak:
RUN dnf install -y fedpkg wget
RUN wget --no-check-certificate <snip>
Running into issue on the last step:
0.685 self.load_rpmdefines() [0/21233]
0.685 ~~~~~~~~~~~~~~~~~~~~^^
0.685 File "/usr/lib/python3.14/site-packages/fedpkg/__init__.py", line 156, in load_rpmdefines
0.685 if not self._load_rpmdefines_branch(self.branch_merge, extra_rpmdefines):
0.685 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0.685 File "/usr/lib/python3.14/site-packages/fedpkg/__init__.py", line 132, in _load_rpmdefines_branch
0.685 self._distval = self._findrawhidebranch()
0.685 ~~~~~~~~~~~~~~~~~~~~~~~^^
0.685 File "/usr/lib/python3.14/site-packages/fedpkg/__init__.py", line 268, in _findrawhidebranch
0.685 for ref in self.repo.refs:
0.685 ^^^^^^^^^
0.685 File "/usr/lib/python3.14/site-packages/pyrpkg/__init__.py", line 878, in repo
0.685 self.load_repo()
0.685 ~~~~~~~~~~~~~~^^
0.685 File "/usr/lib/python3.14/site-packages/pyrpkg/__init__.py", line 888, in load_repo
0.685 raise rpkgError('%s is not a valid repo' % self.path)
0.685 pyrpkg.errors.rpkgError: /ArborX.rpm is not a valid repo
------
1 warning found (use docker --debug to expand):
- WorkdirRelativePath: Relative workdir "ArborX.rpm" can have unexpected results if the base image changes (line 11)
Dockerfile.mac:14
--------------------
12 | RUN wget --no-check-certificate https://github.com/junghans/ArborX/raw/refs/heads/rpmbuild/.github/workflows/ArborX.spec
13 | RUN dnf -y builddep ArborX.spec
14 | >>> RUN fedpkg --verbose --debug local
15 |
--------------------
ERROR: failed to build: failed to solve: process "/bin/sh -c fedpkg --verbose --debug local" did not complete successfully: exit code: 1
Ok, I not sure how that happened, but I was able to reproduce it on one of my machines, try with these 3 lines at the end:
....
RUN dnf -y builddep ArborX.spec
RUN git init -b rawhide
RUN fedpkg --verbose --debug local
May still be a certificate problem
------
> [14/14] RUN GIT_SSL_NO_VERIFY=true fedpkg --verbose --debug local:
0.388 Creating repo object from /ArborX.rpm
0.390 Could not determine the remote name: Cmd('git') failed due to: exit code(1)
0.390 cmdline: git config --get branch.rawhide.remote
0.390 Falling back to default remote name 'origin'
0.392 Failed to get repository name from Git url or pushurl
0.395 Failed to get ns from Git url or pushurl
0.398 Initiating a koji session to https://koji.fedoraproject.org/kojihub
0.465 Unable to query Koji to find rawhide target. Continue offline.
0.465 Could not execute local: Unable to find rawhide target
0.466 Traceback (most recent call last):
How about
...
RUN dnf -y builddep ArborX.spec
RUN rpmbuild --define "_sourcedir $(pwd)" -ba ArborX.spec
How about
That works. I'm trying to build it. I'm getting warnings like this
/root/rpmbuild/BUILD/ArborX-9999-build/ArborX/src/kokkos_ext/ArborX_KokkosExtMinMaxReduce.hpp:26:1: note: parameter passing for argument of type ‘std::pair<double, doub
le>’ when C++17 is enabled changed to match C++14 in GCC 10.1
Is it possible to go into the build directory and try to compile manually? If I try make instead of rpmbuild, I'm getting
g++: fatal error: environment variable ‘RPM_ARCH’ not defined
Something like:
export RPM_ARCH=$(uname -m)
export RPM_PACKAGE_RELEASE=1
export RPM_PACKAGE_VERSION=9999
export RPM_PACKAGE_NAME=ArborX
Getting some errors when building inside the container
{standard input}: Assembler messages:
{standard input}:21745563: Warning: end of file not at end of a line; newline inserted
{standard input}:21746825: Error: unknown pseudo-op: `.ul'
But the two tests that fail in this PR were built (ArborX_Test_SpecializedTraversals and ArborX_Test_Clustering) and pass. Not sure if they need some specific OMP_NUM_THREADS or configuration, or whether they fail intermittently.