When the inversion source has support on only even sites, convergence seems to be OK with a tiny heavy-quark residual. But when a source has support on only odd sites, the residual decreases at a glacial rate while the inverter runs out of restarts, leaving a heavy-quark residual of order one or two hundred. The propagator solution at moderate distance from the source seems to be very sensitive to the input stopping conditions, so I suspect it is not properly converged.

May 01 '23 04:05 detar

Thanks for the info, Carleton. Do you have a reference MILC input file I can use to reproduce this? Also, what ensemble(s) have you been seeing this on?

May 01 '23 16:05 weinbe2

Thank you for helping with this, Evan. I should say first that I am building QUDA with Jim Simone's branch, "not_a_feature_rather_a_hope/staggered_correlator_gk", but he has been merging develop into it. In retrospect, I found the same problem in output logs from Perlmutter with QUDA/develop. So the problem should also be reproducible with the develop branch. It will take a little time to create the reproducer. In the mean time, it would be worth looking at the code to see if there is some obvious difference in the heavy-quark residual treatment between an exclusively even-site and an exclusively odd-site source.

On 5/1/23 10:22 AM, Evan Weinberg wrote:

Thanks for the info, Carleton. Do you have a reference MILC input file I can use to reproduce this? Also, what ensemble(s) have you been seeing this on?

— Reply to this email directly, view it on GitHub https://github.com/lattice/quda/issues/1376#issuecomment-1529917151, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABP6HXSM2TMDH2LM7DXVTZLXD7PMPANCNFSM6AAAAAAXRMO3NA. You are receiving this because you authored the thread.Message ID: @.***>

May 01 '23 21:05 detar

Thanks Carleton, the reproducer may be necessary so I can understand the full workflow. The CG code "doesn't know" about even/odd, it's just handed an operator. The stencil code knows the bare minimum necessary, most relevantly in the prepare/reconstruct bits. You can double check here, though I just did a skim and all looks good (aka "symmetric" between even and odd), check the code for the DiracImprovedStaggeredPC operator: https://github.com/lattice/quda/blob/develop/lib/dirac_improved_staggered.cpp

Remind me, does MILC use the odd Schur op directly, or does it prepare an even source, use the even op, and reconstruct?

May 02 '23 15:05 weinbe2

Thanks, Evan. Do you have access to Frontier? Or should I port the reproducer to Summit?

MILC calls QUDA specifying the "parity" of the solve. For a strictly odd parity source (rhs), its even-odd block decomposition has the form

[ 0 ]

[ b_o ]

and, when QUDA is called, specifying odd parity, the solution should have the form

[ 0 ]

[ B b_o ]

where B = 1/(D^2 + 4m^2).

The MILC code "reconsructs" by multiplying by M^\dagger:

[ -D B b_o ]

[ 2m B b_o ]

For a strictly even-parity source with a call to QUDA specifying even parity, we just interchange even and odd here. There is nothing asymmetric in the MILC treatment outside QUDA.

On 5/2/23 9:18 AM, Evan Weinberg wrote:

Thanks Carleton, the reproducer may be necessary so I can understand the full workflow. The CG code "doesn't know" about even/odd, it's just handed an operator. The stencil code knows the bare minimum necessary, most relevantly in the prepare/reconstruct bits. You can double check here, though I just did a skim and all looks good (aka "symmetric" between even and odd), check the code for the |DiracImprovedStaggeredPC| operator: https://github.com/lattice/quda/blob/develop/lib/dirac_improved_staggered.cpp

Remind me, does MILC use the odd Schur op directly, or does it prepare an even source, use the even op, and reconstruct?

— Reply to this email directly, view it on GitHub https://github.com/lattice/quda/issues/1376#issuecomment-1531662914, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABP6HXRT2PHHSMBTTKNVUFDXEEQVFANCNFSM6AAAAAAXRMO3NA. You are receiving this because you authored the thread.Message ID: @.***>

May 02 '23 21:05 detar

Thanks Carleton. I'm in the moving and I'm not quite sure where my keyfob is right now---can you send me your submit script and input file via Slack or e-mail? I have configurations of various sizes (64^3, 96^3, 144^3, 192^3) on our internal cluster, hopefully one of those will suffice to reproduce the issue.

Also, thank you for describing the measurement. Between your description and my code investigations, I can't spot any inherent issue/asymmetry, but extra investigations will clearly be in order.

May 03 '23 14:05 weinbe2

One question---is the host source in MILC single parity or the length of the full volume? It looks like qudaInvert is assuming it is a full volume source (contiguous even and odd), and as such there is an offset into just the odd part. This has been the assumption in the code for a very, very long time---I checked, and it's been this way for 10 years!!

If in reality the source in MILC is single parity (just odd), this could give garbage---and if the off chance the memory it points to is zero, it's reasonable that it would be a very slow (as well as meaningless) solve.

For reference:

The offset calculation routine is called here: https://github.com/lattice/quda/blob/develop/lib/milc_interface.cpp#L1203 (note the hard-coded false)
And the offset routine itself is called here: https://github.com/lattice/quda/blob/develop/lib/milc_interface.cpp#L1031 ; note that, since the second argument is false, it'll give an offset halfway into the lattice (volume * 6 / 2, or full volume times Nc == 3 times 2 for complex divided by 2 for half of the lattice).

May 03 '23 15:05 weinbe2

All of the MILC color vector field are full.

On 5/3/23 9:03 AM, Evan Weinberg wrote:

One question---is the host source in MILC single parity or the length of the full volume? It looks like |qudaInvert| is assuming it is a full volume source (contiguous even and odd), and as such there is an offset into just the odd part. This has been the assumption in the code for a very, very long time---I checked, and it's been this way for 10 years!!

If in reality the source in MILC is single parity (just odd), this could give garbage---and if the off chance the memory it points to is zero, it's reasonable that it would be a very slow (as well as meaningless) solve.

For reference:

The offset calculation routine is called here: https://github.com/lattice/quda/blob/develop/lib/milc_interface.cpp#L1203 (note the hard-coded |false|)

And the offset routine itself is called here: https://github.com/lattice/quda/blob/develop/lib/milc_interface.cpp#L1031 ; note that, since the second argument is false, it'll give an offset halfway into the lattice (|volume * 6 / 2|, or full volume times Nc == 3 times 2 for complex divided by 2 for half of the lattice).

— Reply to this email directly, view it on GitHub https://github.com/lattice/quda/issues/1376#issuecomment-1533203090, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABP6HXR3JPQKZP25GVDMWUDXEJXVLANCNFSM6AAAAAAXRMO3NA. You are receiving this because you authored the thread.Message ID: @.***>

May 03 '23 19:05 detar

Hi Evan,

I sent a bunch of files in my last message, but I just discovered that our MIMEdefang probably deleted the shell script attachments. So here they are with modifed extensions.

Best,

Carleton

On 5/3/23 9:03 AM, Evan Weinberg wrote:

One question---is the host source in MILC single parity or the length of the full volume? It looks like |qudaInvert| is assuming it is a full volume source (contiguous even and odd), and as such there is an offset into just the odd part. This has been the assumption in the code for a very, very long time---I checked, and it's been this way for 10 years!!

If in reality the source in MILC is single parity (just odd), this could give garbage---and if the off chance the memory it points to is zero, it's reasonable that it would be a very slow (as well as meaningless) solve.

For reference:

The offset calculation routine is called here: https://github.com/lattice/quda/blob/develop/lib/milc_interface.cpp#L1203 (note the hard-coded |false|)

And the offset routine itself is called here: https://github.com/lattice/quda/blob/develop/lib/milc_interface.cpp#L1031 ; note that, since the second argument is false, it'll give an offset halfway into the lattice (|volume * 6 / 2|, or full volume times Nc == 3 times 2 for complex divided by 2 for half of the lattice).

— Reply to this email directly, view it on GitHub https://github.com/lattice/quda/issues/1376#issuecomment-1533203090, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABP6HXR3JPQKZP25GVDMWUDXEJXVLANCNFSM6AAAAAAXRMO3NA. You are receiving this because you authored the thread.Message ID: @.***>

#! /bin/bash

source env.sh

QUDA_INSTALL=${INSTALLROOT}/quda

LIBQUDA="-Wl,-rpath ${QUDA_INSTALL}/lib -L${QUDA_INSTALL}/lib -lquda -D__gfx90a --amdgpu-target=gfx90a -Wl,-rpath=${ROCM_PATH}/hiprand/lib -L${ROCM_PATH}/hiprand/lib -Wl,-rpath=${ROCM_PATH}/rocfft/lib -L${ROCM_PATH}/rocfft/lib -lhiprand -lrocfft -Wl,-rpath=${ROCM_PATH}/hipblas/lib -L${ROCM_PATH}/hipblas/lib -lhipblas -Wl,-rpath=${ROCM_PATH}/rocblas/lib -L${ROCM_PATH}/rocblas/lib -lrocblas -Wl,-rpath=${ROCM_PATH}/hip/lib"

############ Make ks_spectrum_hisq ################## cd milc_qcd/ks_spectrum cp ../Makefile . make clean

MY_CC=hipcc
MY_CXX=hipcc
ARCH=""
COMPILER="gnu"
OFFLOAD="HIP"
OPT="-O3 -Ofast -g"
PATH_TO_NVHPCSDK=""
CUDA_HOME=""
QUDA_HOME=${QUDA_INSTALL}
QUDA_VERBOSITY=VERBOSE
WANTQUDA=true
WANT_FN_CG_GPU=true
WANT_FL_GPU=true
WANT_GF_GPU=true
WANT_FF_GPU=true
WANT_KS_CONT_GPU=true
WANT_SHIFT_GPU=true
WANT_SPIN_TASTE_GPU=true
WANT_GAUGEFIX_OVR_GPU=true
WANT_MIXED_PRECISION_GPU=1
PRECISION=2
MPP=true
OMP=true
WANTQIO=true
WANTQMP=true
QIOPAR=/ccs/home/detar/frontier/quda/install/quda
QMPPAR=/ccs/home/detar/frontier/quda/install/quda
LIBQUDA=${LIBQUDA}
CGEOM="-DFIX_NODE_GEOM -DFIX_IONODE_GEOM"
KSCGMULTI="-DKS_MULTICG=HYBRID -DMULTISOURCE -DMULTIGRID"
CTIME="-DNERSC_TIME -DCGTIME -DFFTIME -DFLTIME -DGFTIME -DREMAP -DPRTIME -DIOTIME"
make -j 1 ks_spectrum_hisq cd ..

############ Make su3_rhmd_hisq ################## cd ks_imp_rhmc cp ../Makefile . make clean

MY_CC=hipcc
MY_CXX=hipcc
ARCH=""
COMPILER="gnu"
OFFLOAD="HIP"
OPT="-O3 -Ofast"
PATH_TO_NVHPCSDK=""
CUDA_HOME=""
QUDA_HOME=${QUDA_INSTALL}
WANTQUDA=true
WANT_FN_CG_GPU=true
WANT_FL_GPU=true
WANT_GF_GPU=true
WANT_FF_GPU=true
WANT_GAUGEFIX_OVR_GPU=true
WANT_MIXED_PRECISION_GPU=2
PRECISION=1
MPP=true
OMP=true
WANTQIO=true
WANTQMP=true
QIOPAR=/ccs/home/detar/frontier/quda/install/quda
QMPPAR=/ccs/home/detar/frontier/quda/install/quda
LIBQUDA=${LIBQUDA}
CGEOM="-DFIX_NODE_GEOM -DFIX_IONODE_GEOM"
KSCGMULTI="-DKS_MULTICG=HYBRID -DMULTISOURCE -DMULTIGRID"
CTIME="-DNERSC_TIME -DCGTIME -DFFTIME -DFLTIME -DGFTIME -DREMAP -DPRTIME -DIOTIME"
make -j 1 su3_rhmd_hisq cd ../..

#! /bin/bash

BRANCH=not_a_feature_rather_a_hope/staggered_correlator_gk #BRANCH=develop

source env.sh

pushd quda

QUDA_HOME=$(pwd)

if [ -d quda ] then cd quda git pull

git checkout develop

git checkout ${BRANCH} else git clone --branch ${BRANCH} https://github.com/lattice/quda cd quda git checkout ${BRANCH} fi cd ..

mkdir -p build && cd build

cmake
-DCMAKE_BUILD_TYPE=RELEASE
-DCMAKE_CXX_COMPILER=CC
-DCMAKE_CXX_FLAGS="--offload-arch=gfx90a"
-DCMAKE_C_COMPILER=cc
-DCMAKE_C_FLAGS="--offload-arch=gfx90a"
-DCMAKE_C_STANDARD=99
-DCMAKE_EXE_LINKER_FLAGS="--offload-arch=gfx90a"
-DCMAKE_HIP_FLAGS="--offload-arch=gfx90a"
-DCMAKE_INSTALL_PREFIX=${INSTALLROOT}/quda
-DQUDA_BUILD_SHAREDLIB=ON
-DQUDA_CONTRACT=ON
-DQUDA_COVDEV=ON
-DQUDA_DIRAC_DEFAULT_OFF=ON
-DQUDA_DIRAC_STAGGERED=ON
-DQUDA_DOWNLOAD_USQCD=ON
-DQUDA_GPU_ARCH=gfx90a
-DQUDA_QIO=ON
-DQUDA_QMP=ON
-DQUDA_TARGET_TYPE=HIP
-DROCM_PATH=${ROCM_PATH}
${QUDA_HOME}/quda

-DCMAKE_SHARED_LINKER_FLAGS="—-offload-arch=gfx90a" \

make -j16 install

cd ..

#! /bin/bash

#SBATCH -t 10:00
#SBATCH -N 96
#SBATCH -n 768
#SBATCH --cpus-per-task=6 #SBATCH --ntasks-per-node=8 ####### -J (command line in spawnjob.py) #SBATCH -A phy157-ecphisq #SBATCH -V ####### -C nvme #SBATCH -S 2

Submission command must define environment the variable RUNCMDFILE

sbatch -N ${NODES} -t ${walltime} -J ${jobname} ${slurm_script}

nodes=96 umask 0022

source env.sh

Run production jobs out of $SCRATCH

SCRATCH_HOME=/gpfs/alpine/proj-shared/phy157/phy157hisq/detar/allHISQ/frontier MYSCRATCH=${SCRATCH_HOME}/l144288f211b700m000569m01555m1827 mkdir -p ${MYSCRATCH} cd ${MYSCRATCH}

QUDA flags

export QUDA_ENABLE_GDR=1 export QUDA_ENABLE_P2P=1 export QUDA_MILC_HISQ_RECONSTRUCT=13 export QUDA_MILC_HISQ_RECONSTRUCT_SLOPPY=9 export QUDA_RESOURCE_PATH=pwd # location of QUDA tunecache file

MPICH flags

export MPICH_GPU_SUPPORT_ENABLED=1 # Allow GDR export MPICH_COLL_SYNC=MPI_Bcast # Sometimes broadcast is not synchronizing.... export MPICH_RDMA_ENABLED_CUDA=1 export MPICH_OFI_NIC_POLICY=BLOCK export MPICH_SMP_SINGLE_COPY_MODE=CMA

Balint 6-CPU Masks for Frontier

export OMP_NUM_THREADS=6 export OMP_PROC_BIND=spread MASK_0="0x003f000000000000" MASK_1="0x3f00000000000000" MASK_2="0x00000000003f0000" MASK_3="0x000000003f000000" MASK_4="0x000000000000007e" MASK_5="0x0000000000007e00" MASK_6="0x0000003f00000000" MASK_7="0x00003f0000000000" MEMBIND="--mem-bind=map_mem:3,3,1,1,0,0,2,2" CPU_MASK="--cpu-bind=mask_cpu:${MASK_0},${MASK_1},${MASK_2},${MASK_3},${MASK_4},${MASK_5},${MASK_6},${MASK_7}"

srun -n 768 -N 96 ks_spectrum_hisq --distribution=*:block ${CPU_MASK} -qmp-geom 2 4 4 24 -qmp-alloc-map 3 2 1 0 -qmp-logic-map 3 2 1 0 intest outtest

#! /bin/bash

#SBATCH -t 10:00 #SBATCH -n 768 #SBATCH --cpus-per-task=6 #SBATCH --ntasks-per-node=8 ####### -J (command line in spawnjob.py) #SBATCH -A phy157-ecphisq #SBATCH -V ####### -C nvme #SBATCH -S 2

Submission command must define environment the variable RUNCMDFILE

sbatch -N ${NODES} -t ${walltime} -J ${jobname} ${slurm_script}

nodes=96 umask 0022

source env.sh

Run production jobs out of $SCRATCH

SCRATCH_HOME=/gpfs/alpine/proj-shared/phy157/phy157hisq/detar/allHISQ/frontier MYSCRATCH=${SCRATCH_HOME}/l144288f211b700m000569m01555m1827 mkdir -p ${MYSCRATCH} cd ${MYSCRATCH}

QUDA flags

export QUDA_ENABLE_GDR=1 export QUDA_ENABLE_P2P=1 export QUDA_MILC_HISQ_RECONSTRUCT=13 export QUDA_MILC_HISQ_RECONSTRUCT_SLOPPY=9 export QUDA_RESOURCE_PATH=pwd # location of QUDA tunecache file

MPICH flags

export MPICH_GPU_SUPPORT_ENABLED=1 # Allow GDR export MPICH_COLL_SYNC=MPI_Bcast # Sometimes broadcast is not synchronizing.... export MPICH_RDMA_ENABLED_CUDA=1 export MPICH_OFI_NIC_POLICY=BLOCK export MPICH_SMP_SINGLE_COPY_MODE=CMA

Balint 6-CPU Masks for Frontier

export OMP_NUM_THREADS=6 export OMP_PROC_BIND=spread MASK_0="0x003f000000000000" MASK_1="0x3f00000000000000" MASK_2="0x00000000003f0000" MASK_3="0x000000003f000000" MASK_4="0x000000000000007e" MASK_5="0x0000000000007e00" MASK_6="0x0000003f00000000" MASK_7="0x00003f0000000000" MEMBIND="--mem-bind=map_mem:3,3,1,1,0,0,2,2" CPU_MASK="--cpu-bind=mask_cpu:${MASK_0},${MASK_1},${MASK_2},${MASK_3},${MASK_4},${MASK_5},${MASK_6},${MASK_7}"

srun -n 768 -N 96 ks_spectrum_hisq --distribution=*:block ${CPU_MASK} -qmp-geom 2 4 4 24 -qmp-alloc-map 3 2 1 0 -qmp-logic-map 3 2 1 0 intest outtest

#== Geometry == prompt 0 nx 144 ny 144 nz 144 nt 288 node_geometry 2 4 4 24 ionode_geometry 2 4 4 24 iseed 402129 job_id 1311221

#== Gauge == reload_parallel /lustre/orion/proj-shared/phy157/phy157_hisq/detar/allHISQ/l144288f211b700m000569m01555m1827/lat/v5/l144288f211b700m000569m01555m1827a.402 u0 1 no_gauge_fix forget staple_weight 0.05 ape_iter 20 coordinate_origin 0 0 0 0 time_bc antiperiodic

#== Eigen == max_number_of_eigenpairs 0

#== PBP Masses ==

number_of_pbp_masses 0

#== Base Sources ==

number_of_base_sources 1

#== source 0: RandomColorWallSource == random_color_wall field_type KS subset corner t0 129 ncolor 1 momentum 0 0 0 source_label RW forget_source

#== Modified Sources ==

number_of_modified_sources 0

#== KSsolveSets ==

number_of_sets 1

#== KSsolveSet == set_type single inv_type CGZ max_cg_iterations 4000 max_cg_restarts 10 check yes momentum_twist 0 0 0 precision 2 source 0 number_of_propagators 2

#== propagator 0: KSsolveElement == mass 0.000569 naik_term_epsilon 0. error_for_propagator 1e-7 rel_error_for_propagator 0.0 fresh_ksprop forget_ksprop

#== propagator 1: KSsolveElement == mass 0.843 naik_term_epsilon -0.3578 error_for_propagator 0 rel_error_for_propagator 2e-4 fresh_ksprop forget_ksprop

#== Quarks ==

number_of_quarks 2

#== quark 0: QuarkIdentitySink == propagator 0 identity op_label d forget_ksprop

#== quark 1: QuarkIdentitySink == propagator 1 identity op_label d forget_ksprop

number_of_mesons 1

#== MesonSpectrum == pair 1 0 spectrum_request meson forget_corr r_offset 0 0 0 129 number_of_correlators 1 correlator P5-P5 p000-fine 1 / 124416.0 G5-G5 0 0 0 EO EO EO

#== Baryons ==

number_of_baryons 0

From Peter for Grid

module swap PrgEnv-cray PrgEnv-amd module load craype-accel-amd-gfx90a #module load cray-mpich/8.1.23 module load cmake #module load amd/5.3.0 module load cray-hdf5 module load cray-fftw module load gmp module load emacs module unload cray-libsci module list

These must be set before running

export TOPDIR_HIP=~/frontier/quda export SRCROOT=${TOPDIR_HIP} export BUILDROOT=${TOPDIR_HIP} export INSTALLROOT=${TOPDIR_HIP}/install export TARGET_GPU=gfx90a

#GTL_ROOT=$PE_MPICH_GTL_DIR_amd_gfx90a GTL_ROOT=/opt/cray/pe/mpich/8.1.25/gtl/lib

MPI_CFLAGS="-I${MPICH_DIR}/include -g" MPI_LDFLAGS="-g -Wl,-rpath=${MPICH_DIR}/lib -L${MPICH_DIR}/lib -lmpi -L${GTL_ROOT} -Wl,-rpath=${GTL_ROOT} -lmpi_gtl_hsa"

export PK_BUILD_TYPE="Release"

export PATH=${ROCM_PATH}/bin:${ROCM_PATH}/llvm/bin:${PATH}

QIOLIB=${INSTALLROOT}/qio/lib QMPLIB=${INSTALLROOT}/qmp/lib export LD_LIBRARY_PATH=${INSTALLROOT}/quda/lib:${QMPLIB}:${QIOLIB}:${ROCM_PATH}/llvm/lib64:${ROCM_PATH}/llvm/lib:${MPICH_DIR}/lib:${GTL_ROOT}:${LD_LIBRARY_PATH} export LD_LIBRARY_PATH=/opt/cray/pe/gcc/mpfr/3.1.4/lib:${LD_LIBRARY_PATH}

end

May 04 '23 03:05 detar

Thank you, Carleton. I'm sorry that I haven't had a chance to test this yet, but I'll be able to on Monday; the requisite scripts are essentially ready to go.

May 05 '23 21:05 weinbe2

Just an update, as a quick test I saw if I could reproduce the behavior on a smaller lattice (64^3x96) and I was unsuccessful, so now I'm going to try a configuration from an ensemble with the same global volume, beta, quark masses, etc. I'll keep you updated.

May 08 '23 19:05 weinbe2

I may have found the issue, will post back soon.

May 10 '23 19:05 weinbe2

I've reproduced the behavior on a 144^3 configuration, for both an odd and even source. It seems like the logic for heavy quark residual reliable updates is breaking down in ways that it wasn't on smaller configurations (i.e., 64^3, 96^3). We have a call on Wednesday and we'll figure out a solution.

May 16 '23 15:05 weinbe2

Hi Evan,

Any progress?

Thanks,

Carleton

On 5/16/23 9:03 AM, Evan Weinberg wrote:

I've reproduced the behavior on a 144^3 configuration, for both an odd /and/ even source. It seems like the logic for heavy quark residual reliable updates is breaking down in ways that it wasn't on smaller configurations (i.e., 64^3, 96^3). We have a call on Wednesday and we'll figure out a solution.

— Reply to this email directly, view it on GitHub https://github.com/lattice/quda/issues/1376#issuecomment-1549851977, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABP6HXWIVLGLKQEV6E4E3X3XGOJLVANCNFSM6AAAAAAXRMO3NA. You are receiving this because you authored the thread.Message ID: @.***>

May 26 '23 16:05 detar

I've put together a fix for the heavy quark convergence issue that at least works in the cases where I've been able to trigger the issue. Can you please test it for your case, @detar ? The code is in the branch hotfix/heavy-quark-restart. Once you've confirmed it works I'll get the ball rolling on a formal PR into develop.

Jun 07 '23 22:06 weinbe2

quda
quda copied to clipboard

Staggered heavy-quark residual fails to regulate HISQ CG convergence properly with an odd checkerboard source

git checkout develop

-DCMAKE_SHARED_LINKER_FLAGS="—-offload-arch=gfx90a" \

Submission command must define environment the variable RUNCMDFILE

sbatch -N ${NODES} -t ${walltime} -J ${jobname} ${slurm_script}

Run production jobs out of $SCRATCH

QUDA flags

MPICH flags

Balint 6-CPU Masks for Frontier

Submission command must define environment the variable RUNCMDFILE

sbatch -N ${NODES} -t ${walltime} -J ${jobname} ${slurm_script}

Run production jobs out of $SCRATCH

QUDA flags

MPICH flags

Balint 6-CPU Masks for Frontier

From Peter for Grid

These must be set before running

end

quda quda copied to clipboard

Staggered heavy-quark residual fails to regulate HISQ CG convergence properly with an odd checkerboard source

git checkout develop

-DCMAKE_SHARED_LINKER_FLAGS="—-offload-arch=gfx90a" \

Submission command must define environment the variable RUNCMDFILE

sbatch -N ${NODES} -t ${walltime} -J ${jobname} ${slurm_script}

Run production jobs out of $SCRATCH

QUDA flags

MPICH flags

Balint 6-CPU Masks for Frontier

Submission command must define environment the variable RUNCMDFILE

sbatch -N ${NODES} -t ${walltime} -J ${jobname} ${slurm_script}

Run production jobs out of $SCRATCH

QUDA flags

MPICH flags

Balint 6-CPU Masks for Frontier

From Peter for Grid

These must be set before running

end

quda
quda copied to clipboard