Regenerate modulefiles on update (fixes #1601)
This is as yet completely untested. But the idea is:
- Split out the module generation into a separate script into a new bin directory. I think admins may want to run this at times as well
- Update the script to only generate .rpmnew files if the files are different than the currently installed ones
- Run that script whenever the /opt/intel/oneapi directory is updated by another rpm
- Only run the %postun script when the package is completely removed, not on updates
Does the CI generate rpms that can be tested?
Test Results
27 files ±0 27 suites ±0 41s ⏱️ ±0s 53 tests ±0 49 ✅ ±0 4 💤 ±0 0 ❌ ±0 99 runs ±0 93 ✅ ±0 6 💤 ±0 0 ❌ ±0
Results for commit ee1a6d47. ± Comparison against base commit 6ec75539.
:recycle: This comment has been updated with latest results.
Does the CI generate rpms that can be tested?
Yes, for each OS there should be an RPM attached to the GitHub Actions. The RPMs are only kept for 24 hours however, previously we reached space limits keeping them for a longer time.
Thanks for your PR. I will need at least one week before being able to look closer at this PR.
I would like to run new shell scripts through shellcheck. We have a https://github.com/openhpc/ohpc/blob/3.x/tests/ci/Makefile which does that for us. Could you add the new shellscript to the shellcheck, whitespace and shfmt sections there. If you prefer to not do these changes I can also do them later.
There is a similar script in the intel MPI compatibility package. I guess we should do the same changes there, right?
Am I right that shmft want TAB characters for indentation?
Am I right that shmft want TAB characters for indentation?
We just use the defaults that shfmt defines. The main goal is to be consistent. I never looked at the details.
Just running make -C tests/ci/ shfmt-lint should fix it.
A friendly reminder that this PR had no activity for 30 days.
Any update on the PR? Still cannot upgrade oneAPI smoothly.
A friendly reminder that this PR had no activity for 30 days.
(This is just a message to prevent expiration. Please ignore it.)
I'll just note that in a previous CI run this was the output:
Running scriptlet: intel-oneapi-toolkit-release-ohpc-2024.0-310.ohpc. 10/10
Generating new oneAPI modulefiles
/opt/intel/oneapi/modulefiles-setup.sh: line 119: cd: /opt/intel/oneapi/compiler/2024.0/modulefiles/../opt/oclfpga/modulefiles: No such file or directory
Creating OpenHPC-style modulefiles for local oneAPI compiler installation(s).
--> Installing modulefile for version=2023.2.1
/opt/ohpc/pub/bin/ohpc-update-modules-intel: line 102: ver: command not found
cmp: /opt/ohpc/pub/moduledeps/gnu/mkl/: Is a directory
--> Installing modulefile for version=2024.0.0
/opt/ohpc/pub/bin/ohpc-update-modules-intel: line 102: ver: command not found
cmp: /opt/ohpc/pub/moduledeps/gnu/mkl/: Is a directory
--> Installing modulefile for version=2025.1.1
/opt/ohpc/pub/bin/ohpc-update-modules-intel: line 102: ver: command not found
cmp: /opt/ohpc/pub/moduledeps/gnu/mkl/: Is a directory
/var/tmp/rpm-tmp.il2Cj4: line 1: /opt/ohpc/pub/bin/ohpc-update-modules-mpi: No such file or directory
warning: %transfiletriggerin(intel-mpi-devel-ohpc-2024.0-9999.ci.ohpc.2.x86_64) scriptlet failed, exit status 127
Error in <unknown> scriptlet in rpm package intel-oneapi-toolkit-release-ohpc
Verifying : intel-mpi-devel-ohpc-2024.0-9999.ci.ohpc.2.x86_64 1/10
Verifying : intel-psxe-mpi-devel-ohpc-2024.0-9999.ci.ohpc.2.x8 2/10
Verifying : intel-psxe-compilers-devel-ohpc-2024.0-9999.ci.ohp 3/10
Verifying : ohpc-buildroot-3.2-9999.ci.ohpc.2.noarch 4/10
Verifying : intel-compilers-devel-ohpc-2024.0-9999.ci.ohpc.2.x 5/10
Verifying : intel-compilers-devel-ohpc-2024.0-310.ohpc.4.1.x86 6/10
Verifying : intel-oneapi-toolkit-release-ohpc-2024.0-9999.ci.o 7/10
Verifying : intel-oneapi-toolkit-release-ohpc-2024.0-310.ohpc. 8/10
Verifying : ohpc-filesystem-3.2-9999.ci.ohpc.2.noarch 9/10
Error: Transaction failed
Verifying : ohpc-filesystem-3.2-330.ohpc.1.1.noarch 10/10
Upgraded:
intel-compilers-devel-ohpc-2024.0-9999.ci.ohpc.2.x86_64
intel-oneapi-toolkit-release-ohpc-2024.0-9999.ci.ohpc.2.x86_64
ohpc-filesystem-3.2-9999.ci.ohpc.2.noarch
Installed:
intel-mpi-devel-ohpc-2024.0-9999.ci.ohpc.2.x86_64
ohpc-buildroot-3.2-9999.ci.ohpc.2.noarch
Failed:
intel-psxe-compilers-devel-ohpc-2024.0-9999.ci.ohpc.2.x86_64
intel-psxe-mpi-devel-ohpc-2024.0-9999.ci.ohpc.2.x86_64
+ true
This indicated a problem in the package, but apparently it was ignored. It this intentional?
I think this is ready for review now
This indicated a problem in the package, but apparently it was ignored. It this intentional?
Yes and no. When it comes to testing things in CI with the Intel compiler we are not yet there. The testing still has a couple of places where the compiler family is hardcoded. If you look at https://github.com/openhpc/ohpc/blob/3.x/tests/ci/spec_to_test_mapping.py#L235 (for example) there is still a lot of gnu14 in there. This script also needs to handle the Intel compiler better. Basically we need to replace the hardcoded compiler with the compiler we actually want to test with. Maybe something like https://github.com/openhpc/ohpc/blob/3.x/tests/ci/spec_to_test_mapping.py#L327
Then there is also this line in https://github.com/openhpc/ohpc/blob/3.x/tests/ci/setup_slurm_and_run_tests.sh#L35
# Install rebuilt packages (if any)
# shellcheck disable=SC2046 # (we want the words to be split)
"${PKG[@]}" install $(find /home/"${USER}"/rpmbuild/RPMS/ -name "*rpm") || true
The idea is, as the comment says, to install the rebuilt packages (if any). If we are running without any RPM rebuilt we want to skip installing the packages, thus || true. In your case, it is not a good idea. We probably want to check if there are RPMs and only run the install command if there is a RPM. If we install a RPM then we should catch a failure like you are seeing.
So the current behaviour is not intentional but historical. It is based on how this script evolved and the script needs to be adapted to better handle possible situations.
You need to add %{OHPC_BIN}/ohpc-update-modules-mpi to the %files section. You added it to the psxe file section. This is not really used any more. Not really familiar with that part but I do not think we use the psxe parts any more.
You need to add
%{OHPC_BIN}/ohpc-update-modules-mpito the%filessection. You added it to thepsxefile section. This is not really used any more. Not really familiar with that part but I do not think we use the psxe parts any more.
good catch - I got thrown off by the ordering of sections.
good catch - I got thrown off by the ordering of sections.
Yes, it is confusing.
I think we can remove all the psxe sub package as we do not mention them anywhere. I will add this to today's TSC agenda to see if anyone thinks we still need them.
@opoplawski If you are motivated please remove all the sections concerning psxe from the compatibility RPMs. Let's just drop it. The TSC also agreed that it is not needed any more.
If you do not want to do it, I can do it later.
I'd like to leave this cleaner and am very strapped for time, so I'd prefer to leave it to you if that's okay. Thank you for your work on this project, I find it very helpful.
I'd like to leave this cleaner and am very strapped for time, so I'd prefer to leave it to you if that's okay. Thank you for your work on this project, I find it very helpful.
Sure, no problem. I will take another look at this in the next days and test it some more. But so far it looks ready. Thanks for helping out. I will remove the *psxe* packages in another PR.
Running the mpi script I still see a couple of errors:
# /opt/ohpc/pub/bin/ohpc-update-modules-mpi
Generating new oneAPI modulefiles
/opt/intel/oneapi/modulefiles-setup.sh: line 119: cd: /opt/intel/oneapi/compiler/2024.0/modulefiles/../opt/oclfpga/modulefiles: No such file or directory
Creating OpenHPC-style modulefiles for local oneAPI MPI installation(s).
--> Installing modulefile for version=2021.11
Lmod has detected the following error: The following module(s) are unknown: "mpi/"2021.11""
Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
$ module --ignore_cache load "mpi/"2021.11""
Also make sure that all modulefiles written in TCL start with the string #%Module
/opt/ohpc/pub/moduledeps/intel/impi/2021.11 /opt/ohpc/pub/moduledeps/intel/impi/2021.11.rpmnew differ: byte 889, line 28
/opt/ohpc/pub/moduledeps/gnu/impi/2021.11 /opt/ohpc/pub/moduledeps/gnu/impi/2021.11.rpmnew differ: byte 714, line 21
/opt/ohpc/pub/moduledeps/gnu14/impi/2021.11 /opt/ohpc/pub/moduledeps/gnu14/impi/2021.11.rpmnew differ: byte 714, line 21
cp: cannot stat '/opt/ohpc/pub/moduledeps/gnu/impi/.version.rpmnew': No such file or directory
cmp: /opt/ohpc/pub/moduledeps/gnu14/impi/.version.rpmnew: No such file or directory
md5sum: /opt/ohpc/pub/moduledeps/gnu14/impi/.version.rpmnew: No such file or directory
The unknown module error seems to be because of the quotes you added (probably to make ShellCheck happy). You could disable that check for that line.
Not sure about the other messages. Any ideas how to handle those?
I added a couple of small fixes on top. Thanks for all your help. Merging this now.