spack-manager icon indicating copy to clipboard operation
spack-manager copied to clipboard

Question: how to set up environment to run batch script

Open jhux2 opened this issue 2 years ago • 14 comments

Once I've built naluX on Summit, it's unclear to me how to make sure a batch job has the correct environment to run that executable.

Normally, I'd load the same modules as were used for the build.

I did try quick-activate $SPACK_MANAGER/environments/jhubuild and then submitting a job. The job failed with an error indicating it could not find CUDA.

jhux2 avatar May 25 '22 20:05 jhux2

@psakievich @tasmith4 @jrood-nrel

jhux2 avatar May 25 '22 20:05 jhux2

@jhux2 I believe this should answer your question: https://psakievich.github.io/spack-manager/general/FAQ.html#how-do-i-use-the-executables-i-built-in-my-development-environment

psakievich avatar May 25 '22 20:05 psakievich

This is what I do on Summit for the exawind-driver for example:

export CUDA_LAUNCH_BLOCKING=1
export SPACK_MANAGER=${PROJWORK}/cfd116/jrood/spack-manager-summit
source ${SPACK_MANAGER}/start.sh && spack-start
spack env activate -d ${SPACK_MANAGER}/environments/exawind-summit
spack load exawind
which exawind

jrood-nrel avatar May 25 '22 20:05 jrood-nrel

We should be getting CUDA_LAUNCH_BLOCKING in the environment when we do spack load exawind. Is that not the case @jrood-nrel ?

psakievich avatar May 25 '22 20:05 psakievich

So if I understand, I should do

quick-activate $SPACK_MANAGER/environments/jhubuild

where jhubuild is the "environment" that I built naluX under.

But

spack load naluX

returns

==> Error: Spec 'naluX' matches no installed packages.

I feel that I'm missing something fundamental here.

[EDIT]

Btw, spack load exawind works, but the SHAs of exawind and naluX are different.

jhux2 avatar May 25 '22 20:05 jhux2

Ah yeah that should be the case @psakievich . Guess it's a habit.

https://github.com/psakievich/spack-manager/blob/023fd1469078d8cc9396e3ccd373826cbbd5522f/repos/exawind/packages/nalu-wind/package.py#L43-L48

jrood-nrel avatar May 25 '22 20:05 jrood-nrel

spack load nalu-wind @jhux2

jrood-nrel avatar May 25 '22 20:05 jrood-nrel

@jrood-nrel Thanks. I've launched a couple test jobs to see what effect spack load nalu-wind has.

jhux2 avatar May 25 '22 20:05 jhux2

My jobs failed with the same error as before:

762 FATAL ERROR: dlopen libcudart.so: libcudart.so: cannot open shared object file: No such file or directory
763 FATAL ERROR: dlopen libcudart.so: libcudart.so: cannot open shared object file: No such file or directory
764 [h26n01:464770] Error: common_pami.c:1056 - ompi_common_pami_init() Unable to create PAMI client (rc=1)
765 [h26n01:464771] Error: common_pami.c:1056 - ompi_common_pami_init() Unable to create PAMI client (rc=1)

After issuing spack load nalu-wind, should there be any change in what modules are loaded? Or is that handled by spack setting all the right paths, etc.?

jhux2 avatar May 25 '22 22:05 jhux2

@jhux2 spack should be handling all the right paths. so to confirm your script looks something like this?

# source $SPACK_MANAGER/start.sh has already occured in bashrc
quick-activate $SPACK_MANAGER/environments/jhubuild
spack load nalu-wind
srun [args] naluX -i [args] 

psakievich avatar May 25 '22 22:05 psakievich

@psakievich Here's what I have in my batch script:

  export SPACK_MANAGER=~/exawind/sources/spack-manager
  source $SPACK_MANAGER/start.sh
  quick-activate $SPACK_MANAGER/environments/jhubuild
  spack load nalu-wind

  jsrun ....

This is a script that I've used for a long time. (I did move the naluX executable to another location, but I assume that should be safe to do.)

Where in the spack-manager tree can I find configure/build logs for Trilinos? I'd like to look over those logs to see if anything jumps out.

jhux2 avatar May 25 '22 22:05 jhux2

spack cd -b trilinos will take you there and the spack- files will show you logs for everything that happened

psakievich avatar May 25 '22 23:05 psakievich

@jhux2 where are you at on this? do you still need help?

psakievich avatar Jun 01 '22 19:06 psakievich

@psakievich Thanks for checking in. I haven't returned to this yet. The motivation was to see if building with spack-manager would help work around a Nalu-Wind runtime failure. It turns out there's a bug that affects both solver paths in the NGP code, so how nalu-wind gets built is moot.

jhux2 avatar Jun 01 '22 20:06 jhux2