ohpc icon indicating copy to clipboard operation
ohpc copied to clipboard

Regenerate modulefiles on update (fixes #1601)

Open opoplawski opened this issue 1 year ago • 20 comments

This is as yet completely untested. But the idea is:

  • Split out the module generation into a separate script into a new bin directory. I think admins may want to run this at times as well
  • Update the script to only generate .rpmnew files if the files are different than the currently installed ones
  • Run that script whenever the /opt/intel/oneapi directory is updated by another rpm
  • Only run the %postun script when the package is completely removed, not on updates

Does the CI generate rpms that can be tested?

opoplawski avatar May 25 '24 17:05 opoplawski

Test Results

27 files  ±0  27 suites  ±0   41s ⏱️ ±0s 53 tests ±0  49 ✅ ±0  4 💤 ±0  0 ❌ ±0  99 runs  ±0  93 ✅ ±0  6 💤 ±0  0 ❌ ±0 

Results for commit ee1a6d47. ± Comparison against base commit 6ec75539.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar May 25 '24 18:05 github-actions[bot]

Does the CI generate rpms that can be tested?

Yes, for each OS there should be an RPM attached to the GitHub Actions. The RPMs are only kept for 24 hours however, previously we reached space limits keeping them for a longer time.

Thanks for your PR. I will need at least one week before being able to look closer at this PR.

adrianreber avatar May 25 '24 19:05 adrianreber

I would like to run new shell scripts through shellcheck. We have a https://github.com/openhpc/ohpc/blob/3.x/tests/ci/Makefile which does that for us. Could you add the new shellscript to the shellcheck, whitespace and shfmt sections there. If you prefer to not do these changes I can also do them later.

There is a similar script in the intel MPI compatibility package. I guess we should do the same changes there, right?

adrianreber avatar Jun 03 '24 10:06 adrianreber

Am I right that shmft want TAB characters for indentation?

opoplawski avatar Jul 03 '24 17:07 opoplawski

Am I right that shmft want TAB characters for indentation?

We just use the defaults that shfmt defines. The main goal is to be consistent. I never looked at the details.

Just running make -C tests/ci/ shfmt-lint should fix it.

adrianreber avatar Jul 04 '24 07:07 adrianreber

A friendly reminder that this PR had no activity for 30 days.

github-actions[bot] avatar Aug 09 '24 00:08 github-actions[bot]

Any update on the PR? Still cannot upgrade oneAPI smoothly.

aflyhorse avatar Nov 02 '24 04:11 aflyhorse

A friendly reminder that this PR had no activity for 30 days.

github-actions[bot] avatar Dec 08 '24 00:12 github-actions[bot]

(This is just a message to prevent expiration. Please ignore it.)

aflyhorse avatar Dec 10 '24 05:12 aflyhorse

I'll just note that in a previous CI run this was the output:

  Running scriptlet: intel-oneapi-toolkit-release-ohpc-2024.0-310.ohpc.   10/10 
Generating new oneAPI modulefiles
/opt/intel/oneapi/modulefiles-setup.sh: line 119: cd: /opt/intel/oneapi/compiler/2024.0/modulefiles/../opt/oclfpga/modulefiles: No such file or directory
Creating OpenHPC-style modulefiles for local oneAPI compiler installation(s).
--> Installing modulefile for version=2023.2.1
/opt/ohpc/pub/bin/ohpc-update-modules-intel: line 102: ver: command not found
cmp: /opt/ohpc/pub/moduledeps/gnu/mkl/: Is a directory
--> Installing modulefile for version=2024.0.0
/opt/ohpc/pub/bin/ohpc-update-modules-intel: line 102: ver: command not found
cmp: /opt/ohpc/pub/moduledeps/gnu/mkl/: Is a directory
--> Installing modulefile for version=2025.1.1
/opt/ohpc/pub/bin/ohpc-update-modules-intel: line 102: ver: command not found
cmp: /opt/ohpc/pub/moduledeps/gnu/mkl/: Is a directory

/var/tmp/rpm-tmp.il2Cj4: line 1: /opt/ohpc/pub/bin/ohpc-update-modules-mpi: No such file or directory
warning: %transfiletriggerin(intel-mpi-devel-ohpc-2024.0-9999.ci.ohpc.2.x86_64) scriptlet failed, exit status 127

Error in <unknown> scriptlet in rpm package intel-oneapi-toolkit-release-ohpc
  Verifying        : intel-mpi-devel-ohpc-2024.0-9999.ci.ohpc.2.x86_64     1/10 
  Verifying        : intel-psxe-mpi-devel-ohpc-2024.0-9999.ci.ohpc.2.x8    2/10 
  Verifying        : intel-psxe-compilers-devel-ohpc-2024.0-9999.ci.ohp    3/10 
  Verifying        : ohpc-buildroot-3.2-9999.ci.ohpc.2.noarch              4/10 
  Verifying        : intel-compilers-devel-ohpc-2024.0-9999.ci.ohpc.2.x    5/10 
  Verifying        : intel-compilers-devel-ohpc-2024.0-310.ohpc.4.1.x86    6/10 
  Verifying        : intel-oneapi-toolkit-release-ohpc-2024.0-9999.ci.o    7/10 
  Verifying        : intel-oneapi-toolkit-release-ohpc-2024.0-310.ohpc.    8/10 
  Verifying        : ohpc-filesystem-3.2-9999.ci.ohpc.2.noarch             9/10 
Error: Transaction failed
  Verifying        : ohpc-filesystem-3.2-330.ohpc.1.1.noarch              10/10 

Upgraded:
  intel-compilers-devel-ohpc-2024.0-9999.ci.ohpc.2.x86_64                       
  intel-oneapi-toolkit-release-ohpc-2024.0-9999.ci.ohpc.2.x86_64                
  ohpc-filesystem-3.2-9999.ci.ohpc.2.noarch                                     
Installed:
  intel-mpi-devel-ohpc-2024.0-9999.ci.ohpc.2.x86_64                             
  ohpc-buildroot-3.2-9999.ci.ohpc.2.noarch                                      
Failed:
  intel-psxe-compilers-devel-ohpc-2024.0-9999.ci.ohpc.2.x86_64                  
  intel-psxe-mpi-devel-ohpc-2024.0-9999.ci.ohpc.2.x86_64                        

+ true

This indicated a problem in the package, but apparently it was ignored. It this intentional?

opoplawski avatar May 12 '25 22:05 opoplawski

I think this is ready for review now

opoplawski avatar May 12 '25 22:05 opoplawski

This indicated a problem in the package, but apparently it was ignored. It this intentional?

Yes and no. When it comes to testing things in CI with the Intel compiler we are not yet there. The testing still has a couple of places where the compiler family is hardcoded. If you look at https://github.com/openhpc/ohpc/blob/3.x/tests/ci/spec_to_test_mapping.py#L235 (for example) there is still a lot of gnu14 in there. This script also needs to handle the Intel compiler better. Basically we need to replace the hardcoded compiler with the compiler we actually want to test with. Maybe something like https://github.com/openhpc/ohpc/blob/3.x/tests/ci/spec_to_test_mapping.py#L327

Then there is also this line in https://github.com/openhpc/ohpc/blob/3.x/tests/ci/setup_slurm_and_run_tests.sh#L35

# Install rebuilt packages (if any)
# shellcheck disable=SC2046 # (we want the words to be split)
"${PKG[@]}" install $(find /home/"${USER}"/rpmbuild/RPMS/ -name "*rpm") || true

The idea is, as the comment says, to install the rebuilt packages (if any). If we are running without any RPM rebuilt we want to skip installing the packages, thus || true. In your case, it is not a good idea. We probably want to check if there are RPMs and only run the install command if there is a RPM. If we install a RPM then we should catch a failure like you are seeing.

So the current behaviour is not intentional but historical. It is based on how this script evolved and the script needs to be adapted to better handle possible situations.

adrianreber avatar May 13 '25 07:05 adrianreber

You need to add %{OHPC_BIN}/ohpc-update-modules-mpi to the %files section. You added it to the psxe file section. This is not really used any more. Not really familiar with that part but I do not think we use the psxe parts any more.

adrianreber avatar May 13 '25 15:05 adrianreber

You need to add %{OHPC_BIN}/ohpc-update-modules-mpi to the %files section. You added it to the psxe file section. This is not really used any more. Not really familiar with that part but I do not think we use the psxe parts any more.

good catch - I got thrown off by the ordering of sections.

opoplawski avatar May 13 '25 15:05 opoplawski

good catch - I got thrown off by the ordering of sections.

Yes, it is confusing.

adrianreber avatar May 13 '25 15:05 adrianreber

I think we can remove all the psxe sub package as we do not mention them anywhere. I will add this to today's TSC agenda to see if anyone thinks we still need them.

adrianreber avatar May 14 '25 09:05 adrianreber

@opoplawski If you are motivated please remove all the sections concerning psxe from the compatibility RPMs. Let's just drop it. The TSC also agreed that it is not needed any more.

If you do not want to do it, I can do it later.

adrianreber avatar May 15 '25 07:05 adrianreber

I'd like to leave this cleaner and am very strapped for time, so I'd prefer to leave it to you if that's okay. Thank you for your work on this project, I find it very helpful.

opoplawski avatar May 15 '25 15:05 opoplawski

I'd like to leave this cleaner and am very strapped for time, so I'd prefer to leave it to you if that's okay. Thank you for your work on this project, I find it very helpful.

Sure, no problem. I will take another look at this in the next days and test it some more. But so far it looks ready. Thanks for helping out. I will remove the *psxe* packages in another PR.

adrianreber avatar May 15 '25 15:05 adrianreber

Running the mpi script I still see a couple of errors:

# /opt/ohpc/pub/bin/ohpc-update-modules-mpi
Generating new oneAPI modulefiles
/opt/intel/oneapi/modulefiles-setup.sh: line 119: cd: /opt/intel/oneapi/compiler/2024.0/modulefiles/../opt/oclfpga/modulefiles: No such file or directory
Creating OpenHPC-style modulefiles for local oneAPI MPI installation(s).
--> Installing modulefile for version=2021.11
Lmod has detected the following error: The following module(s) are unknown: "mpi/"2021.11""

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
  $ module --ignore_cache load "mpi/"2021.11""

Also make sure that all modulefiles written in TCL start with the string #%Module



/opt/ohpc/pub/moduledeps/intel/impi/2021.11 /opt/ohpc/pub/moduledeps/intel/impi/2021.11.rpmnew differ: byte 889, line 28
/opt/ohpc/pub/moduledeps/gnu/impi/2021.11 /opt/ohpc/pub/moduledeps/gnu/impi/2021.11.rpmnew differ: byte 714, line 21
/opt/ohpc/pub/moduledeps/gnu14/impi/2021.11 /opt/ohpc/pub/moduledeps/gnu14/impi/2021.11.rpmnew differ: byte 714, line 21
cp: cannot stat '/opt/ohpc/pub/moduledeps/gnu/impi/.version.rpmnew': No such file or directory
cmp: /opt/ohpc/pub/moduledeps/gnu14/impi/.version.rpmnew: No such file or directory
md5sum: /opt/ohpc/pub/moduledeps/gnu14/impi/.version.rpmnew: No such file or directory

The unknown module error seems to be because of the quotes you added (probably to make ShellCheck happy). You could disable that check for that line.

Not sure about the other messages. Any ideas how to handle those?

adrianreber avatar May 16 '25 13:05 adrianreber

I added a couple of small fixes on top. Thanks for all your help. Merging this now.

adrianreber avatar Jun 24 '25 11:06 adrianreber