spack-manager
spack-manager copied to clipboard
Question: how to set up environment to run batch script
Once I've built naluX on Summit, it's unclear to me how to make sure a batch job has the correct environment to run that executable.
Normally, I'd load the same modules as were used for the build.
I did try quick-activate $SPACK_MANAGER/environments/jhubuild
and then submitting a job. The job failed with an error indicating it could not find CUDA.
@psakievich @tasmith4 @jrood-nrel
@jhux2 I believe this should answer your question: https://psakievich.github.io/spack-manager/general/FAQ.html#how-do-i-use-the-executables-i-built-in-my-development-environment
This is what I do on Summit for the exawind-driver for example:
export CUDA_LAUNCH_BLOCKING=1
export SPACK_MANAGER=${PROJWORK}/cfd116/jrood/spack-manager-summit
source ${SPACK_MANAGER}/start.sh && spack-start
spack env activate -d ${SPACK_MANAGER}/environments/exawind-summit
spack load exawind
which exawind
We should be getting CUDA_LAUNCH_BLOCKING
in the environment when we do spack load exawind
. Is that not the case @jrood-nrel ?
So if I understand, I should do
quick-activate $SPACK_MANAGER/environments/jhubuild
where jhubuild
is the "environment" that I built naluX under.
But
spack load naluX
returns
==> Error: Spec 'naluX' matches no installed packages.
I feel that I'm missing something fundamental here.
[EDIT]
Btw, spack load exawind
works, but the SHAs of exawind
and naluX
are different.
Ah yeah that should be the case @psakievich . Guess it's a habit.
https://github.com/psakievich/spack-manager/blob/023fd1469078d8cc9396e3ccd373826cbbd5522f/repos/exawind/packages/nalu-wind/package.py#L43-L48
spack load nalu-wind
@jhux2
@jrood-nrel Thanks. I've launched a couple test jobs to see what effect spack load nalu-wind
has.
My jobs failed with the same error as before:
762 FATAL ERROR: dlopen libcudart.so: libcudart.so: cannot open shared object file: No such file or directory
763 FATAL ERROR: dlopen libcudart.so: libcudart.so: cannot open shared object file: No such file or directory
764 [h26n01:464770] Error: common_pami.c:1056 - ompi_common_pami_init() Unable to create PAMI client (rc=1)
765 [h26n01:464771] Error: common_pami.c:1056 - ompi_common_pami_init() Unable to create PAMI client (rc=1)
After issuing spack load nalu-wind
, should there be any change in what modules are loaded? Or is that handled by spack setting all the right paths, etc.?
@jhux2 spack should be handling all the right paths. so to confirm your script looks something like this?
# source $SPACK_MANAGER/start.sh has already occured in bashrc
quick-activate $SPACK_MANAGER/environments/jhubuild
spack load nalu-wind
srun [args] naluX -i [args]
@psakievich Here's what I have in my batch script:
export SPACK_MANAGER=~/exawind/sources/spack-manager
source $SPACK_MANAGER/start.sh
quick-activate $SPACK_MANAGER/environments/jhubuild
spack load nalu-wind
jsrun ....
This is a script that I've used for a long time. (I did move the naluX executable to another location, but I assume that should be safe to do.)
Where in the spack-manager tree can I find configure/build logs for Trilinos? I'd like to look over those logs to see if anything jumps out.
spack cd -b trilinos
will take you there and the spack- files will show you logs for everything that happened
@jhux2 where are you at on this? do you still need help?
@psakievich Thanks for checking in. I haven't returned to this yet. The motivation was to see if building with spack-manager would help work around a Nalu-Wind runtime failure. It turns out there's a bug that affects both solver paths in the NGP code, so how nalu-wind gets built is moot.