software-layer
software-layer copied to clipboard
EESSI bash initialization to module file
This is a follow up PR to the issue: https://gitlab.com/eessi/support/-/issues/83
Instance eessi-bot-mc-aws is configured to build for:
- architectures:
x86_64/generic,x86_64/intel/haswell,x86_64/intel/skylake_avx512,x86_64/amd/zen2,x86_64/amd/zen3,aarch64/generic,aarch64/neoverse_n1,aarch64/neoverse_v1 - repositories:
eessi.io-2023.06-compat,eessi-hpc.org-2023.06-software,eessi-hpc.org-2023.06-compat,eessi.io-2023.06-software
Instance boegel-bot-deucalion is configured to build for:
- architectures:
aarch64/a64fx - repositories:
eessi.io-2023.06-software
Instance eessi-bot-mc-azure is configured to build for:
- architectures:
x86_64/amd/zen4 - repositories:
eessi.io-2023.06-compat,eessi-hpc.org-2023.06-compat,eessi-hpc.org-2023.06-software,eessi.io-2023.06-software
I'm not sure if it is really a concern, but we add an Lmod family() related to the EESSI version as well so that we have a way to manage "compatible" modules (for example module files related to dev.eessi.io)
This also doesn't cover the cURL issue currently fixed in our initialisation script:
rhel_libcurl_file="/etc/pki/tls/certs/ca-bundle.crt"
if [ -f $rhel_libcurl_file ]; then
show_msg "Found libcurl CAs file at RHEL location, setting CURL_CA_BUNDLE"
export CURL_CA_BUNDLE=$rhel_libcurl_file
fi
You can use isFile() to give you the same logic
The one other thing missing is the current redirection we do for Zen4:
https://github.com/EESSI/software-layer/blob/1fc0cb75a076a37184b9abddba2093118bdcfca6/init/eessi_environment_variables#L51-L59
A quick test, including my suggested changes, shows that this is missing EESSI_CPU_FAMILY and EPREFIX. These seems to be the only ones of consequence
Here's the one that works for me (but still misses EPREFIX and EESSI_CPU_FAMILY and the override for Zen4) and includes the cert bundle check and PS1 change:
help([[
Description
===========
The European Environment for Scientific Software Installations (EESSI, pronounced as easy) is a collaboration between different European partners in HPC community.The goal of this project is to build a common stack of scientific software installations for HPC systems and beyond, including laptops, personal workstations and cloud infrastructure.
More information
================
- URL: https://www.eessi.io/docs/
]])
whatis("Description: The European Environment for Scientific Software Installations (EESSI, pronounced as easy) is a collaboration between different European partners in HPC community. The goal of this project is to build a common stack of scientific software installations for HPC systems and beyond, including laptops, personal workstations and cloud infrastructure.")
whatis("URL: https://www.eessi.io/docs/:")
local eessi_version = myModuleVersion()
local eessi_repo = "/cvmfs/software.eessi.io"
local eessi_prefix = pathJoin(eessi_repo, "versions", eessi_version)
local eessi_os_type = "linux"
pushenv("EESSI_VERSION", eessi_version)
pushenv("EESSI_CVMFS_REPO", eessi_repo)
pushenv("EESSI_OS_TYPE", eessi_os_type)
function archdetect_cpu()
local script = pathJoin(eessi_prefix, 'init', 'lmod_eessi_archdetect_wrapper.sh')
if not os.getenv("EESSI_ARCHDETECT_OPTIONS") then
if convertToCanonical(LmodVersion()) < convertToCanonical("8.6") then
LmodError("Loading this modulefile requires using Lmod version > 8.6, but you can export EESSI_ARCHDETECT_OPTIONS to the available cpu architecture in the form of: x86_64/intel/haswell or aarch64/neoverse_v1")
end
source_sh("bash", script)
end
for archdetect_filter_cpu in string.gmatch(os.getenv("EESSI_ARCHDETECT_OPTIONS"), "([^" .. ":" .. "]+)") do
if isDir(pathJoin(eessi_prefix, "software", eessi_os_type, archdetect_filter_cpu, "software")) then
return archdetect_filter_cpu
end
end
LmodError("Software directory check for the detected architecture failed")
end
local archdetect = archdetect_cpu()
local eessi_cpu_family = archdetect:match("([^/]+)")
local eessi_software_subdir = os.getenv("EESSI_SOFTWARE_SUBDIR_OVERRIDE") or archdetect
local eessi_eprefix = pathJoin(eessi_prefix, "compat", eessi_os_type, eessi_cpu_family)
local eessi_software_path = pathJoin(eessi_prefix, "software", eessi_os_type, eessi_software_subdir)
local eessi_module_path = pathJoin(eessi_software_path, "modules", "all")
local eessi_site_module_path = string.gsub(eessi_module_path, "versions", "host_injections")
pushenv("EESSI_SITE_MODULEPATH", eessi_site_module_path)
pushenv("EESSI_SOFTWARE_SUBDIR", eessi_software_subdir)
pushenv("EESSI_PREFIX", eessi_prefix)
pushenv("EESSI_EPREFIX", eessi_eprefix)
prepend_path("PATH", pathJoin(eessi_eprefix, "bin"))
prepend_path("PATH", pathJoin(eessi_eprefix, "usr/bin"))
pushenv("EESSI_SOFTWARE_PATH", eessi_software_path)
pushenv("EESSI_MODULEPATH", eessi_module_path)
prepend_path("MODULEPATH", eessi_module_path)
prepend_path("MODULEPATH", eessi_site_module_path)
pushenv("LMOD_CONFIG_DIR", pathJoin(eessi_software_path, ".lmod"))
pushenv("LMOD_PACKAGE_PATH", pathJoin(eessi_software_path, ".lmod"))
-- update the prompt (unless overridden)
if not os.getenv("EESSI_RETAIN_PROMPT") then
pushenv("PS1", "{EESSI " .. eessi_version .. "} " .. (os.getenv("PS1") or ""))
end
-- check for RHEL certificate locatioin
local rhel_certificates = "/etc/pki/tls/certs/ca-bundle.crt"
if isFile(rhel_certificates) then
pushenv("CURL_CA_BUNDLE", rhel_certificates)
end
if mode() == "load" then
LmodMessage("EESSI/" .. eessi_version .. " loaded successfully")
end
Setting the PS1 only would create difficulties for prompts, which don't use the PS1 Variable (Starship, OhMyPosh, etc). An easy solution could be the following
if not os.getenv("EESSI_RETAIN_PROMPT") then - pushenv("PS1", "{EESSI " .. eessi_version .. "} " .. (os.getenv("PS1") or "")) + local eessi_prompt = "{EESSI " .. eessi_version .. "}" + pushenv("PS1", eessi_prompt .. (os.getenv("PS1") or "")) + pushenv("EESSI_PROMPT", eessi_prompt) end
But the os.getenv("PS1") still results in nil and so the original prompt will be gone. I have no clue why the variable could not be read by lua, since the bash script has no issues with it.
The one other thing missing is the current redirection we do for Zen4:
https://github.com/EESSI/software-layer/blob/1fc0cb75a076a37184b9abddba2093118bdcfca6/init/eessi_environment_variables#L51-L59
Handled in the commit above
Setting the PS1 only would create difficulties for prompts, which don't use the PS1 Variable (Starship, OhMyPosh, etc). An easy solution could be the following
if not os.getenv("EESSI_RETAIN_PROMPT") then - pushenv("PS1", "{EESSI " .. eessi_version .. "} " .. (os.getenv("PS1") or "")) + local eessi_prompt = "{EESSI " .. eessi_version .. "}" + pushenv("PS1", eessi_prompt .. (os.getenv("PS1") or "")) + pushenv("EESSI_PROMPT", eessi_prompt) endBut the
os.getenv("PS1")still results in nil and so the original prompt will be gone. I have no clue why the variable could not be read by lua, since the bash script has no issues with it.
We can only set PS1 if it is in the environment already
if os.getenv("PS1") and not os.getenv("EESSI_RETAIN_PROMPT") then
os.getenv() returns nil if a variable is not set
We can only set
PS1if it is in the environment alreadyif os.getenv("PS1") and not os.getenv("EESSI_RETAIN_PROMPT") then
That's the Thing I don't get. In my test scenario, PS1 was set and was also readable by the bash script, but os.getenv("PS1") returns the nil.
I believe the problem is that the PS1 variable can't be exported, so it's not available in the module. Or at least the basic ubuntu PS1=${debian_chroot:+($debian_chroot)}\u@\h:\w\$ seems to be an issue. If I provide a custom PS1 the module appends as expected.
EDIT: The issue is that directly exporting doesn't work, if I put export PS1=$PS1 in front of module load it works as expeceted.
Might it be possible that your jobs already fail at test "Test for archdetect_cpu functionality with only one valid path" and not as indicated at test "Test for archdetect_cpu functionality with invalid path".
/home/runner/work/_temp/c11c9c6f-66bb-4412-991a-68d31eb8184c.sh: line 4: module: command not found
This causes the if-statement to fail, but the job task succeeds.
I can't find the task, which source the Lmod init script.
@TopRichard, @MaKaNu is right, you don't have an initialised Lmod when you run the tests. You'll need to add another step that does that, see https://github.com/EESSI/software-layer/blob/b14cef843a13e51c2871fd9a4d8e4d7e74cb349a/init/bash from @MaKaNu PR
Could it be that:
- name: Check out software-layer repository
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- name: Mount EESSI CernVM-FS pilot repository
uses: cvmfs-contrib/github-action-cvmfs@55899ca74cf78ab874bdf47f5a804e47c198743c # v4.0
with:
cvmfs_config_package: https://github.com/EESSI/filesystem-layer/releases/download/latest/cvmfs-config-eessi_latest_all.deb
cvmfs_http_proxy: DIRECT
cvmfs_repositories: software.eessi.io
Loads the release version of eessi and so your changes to the init system are not available?
Could it be that:
- name: Check out software-layer repository uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1 - name: Mount EESSI CernVM-FS pilot repository uses: cvmfs-contrib/github-action-cvmfs@55899ca74cf78ab874bdf47f5a804e47c198743c # v4.0 with: cvmfs_config_package: https://github.com/EESSI/filesystem-layer/releases/download/latest/cvmfs-config-eessi_latest_all.deb cvmfs_http_proxy: DIRECT cvmfs_repositories: software.eessi.ioLoads the release version of eessi and so your changes to the init system are not available?
Yes exactly, so the module EESSI/2023.06 is not available, this should work out after merging the PR
Looks like the test is still failing. Maybe show the result file to verify if the desired string is included?
Looks like the test is still failing. Maybe show the result file to verify if the desired string is included?
Scratch that. The module file to be loaded will only be added by this PR. Hence it will never be able to load it from CVMFS before the module file has been ingested.
bot: build repo:eessi.io-2023.06-software arch:x86_64/generic
Updates by the bot instance eessi-bot-mc-aws
(click for details)
-
received bot command
build repo:eessi.io-2023.06-software arch:x86_64/genericfromTopRichard- expanded format:
build repository:eessi.io-2023.06-software architecture:x86_64/generic
- expanded format:
-
handling command
build repository:eessi.io-2023.06-software architecture:x86_64/genericresulted in:- submitted job
17041, for details & status see https://github.com/EESSI/software-layer/pull/667#issuecomment-2310154569
- submitted job
Updates by the bot instance eessi-bot-mc-azure
(click for details)
- account
TopRichardhas NO permission to send commands to the bot
Updates by the bot instance boegel-bot-deucalion
(click for details)
- account
TopRichardhas NO permission to send commands to the bot
New job on instance eessi-bot-mc-aws for architecture x86_64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_667/17041
| date | job status | comment |
|---|---|---|
| Aug 26 12:59:06 UTC 2024 | submitted | job id 17041 awaits release by job manager |
| Aug 26 12:59:09 UTC 2024 | released | job awaits launch by Slurm scheduler |
| Aug 26 13:05:20 UTC 2024 | running | job 17041 is running |
| Aug 26 13:24:07 UTC 2024 | finished | :grin: SUCCESS (click triangle for details)
|
| Aug 26 13:24:07 UTC 2024 | test result | :grin: SUCCESS (click triangle for details)
|
@trz42 the module file is included in the tarball.
Looks like the test is still failing. Maybe show the result file to verify if the desired string is included?
Scratch that. The module file to be loaded will only be added by this PR. Hence it will never be able to load it from CVMFS before the module file has been ingested.
You can't load it from CVMFS, but you can
module use init/modules
and perform the tests
Looks like the test is still failing. Maybe show the result file to verify if the desired string is included?
Scratch that. The module file to be loaded will only be added by this PR. Hence it will never be able to load it from CVMFS before the module file has been ingested.
You can't load it from CVMFS, but you can
module use init/modulesand perform the tests
Agreed. This sounds like the correct way to test the module file. If it succeeds we can ingest it. If it fails (now or a future changed version), we must not ingest it.
Ok, I think this is pretty much there. One thing I wonder is whether we should be deploying this to a higher level directory? The module file is version specific already so there is no real requirement to deploy it under the versions directory, it could be deployed under
/cvmfs/software.eessi.io/init
and that would make it much easier to have multiple EESSI versions available at once
bot: build repo:eessi.io-2023.06-software arch:x86_64/generic
Updates by the bot instance eessi-bot-mc-aws
(click for details)
-
received bot command
build repo:eessi.io-2023.06-software arch:x86_64/genericfromTopRichard- expanded format:
build repository:eessi.io-2023.06-software architecture:x86_64/generic
- expanded format:
-
handling command
build repository:eessi.io-2023.06-software architecture:x86_64/genericresulted in:- submitted job
17103, for details & status see https://github.com/EESSI/software-layer/pull/667#issuecomment-2312461672
- submitted job
Updates by the bot instance boegel-bot-deucalion
(click for details)
- account
TopRichardhas NO permission to send commands to the bot