quda icon indicating copy to clipboard operation
quda copied to clipboard

`staggered_dslash_test` asqtad verify fails with recon-9/13, partitioning enabled, and computing the fat/long gauge links

Open weinbe2 opened this issue 2 years ago • 1 comments
trafficstars

Minimal cmake build, config needed for now

cmake ../quda/ -DQUDA_DIRAC_DEFAULT_OFF=ON -DQUDA_DIRAC_STAGGERED=ON -DQUDA_GPU_ARCH=sm_80 -DQUDA_MPI=ON -DQUDA_FAST_COMPILE_DSLASH=ON -DQUDA_FAST_COMPILE_REDUCE=ON

Representative command:

mpirun -np 1 ./staggered_dslash_test --verbosity verbose --dim 16 16 16 16 --niter 100 --dslash-type asqtad --partition 4 --prec double --compute-fat-long true --recon 9

I note that this is nothing special to partitioning (or not partitioning) in the t direction, so temporal boundary conditions aren't the (only?) issue.

This was missed because for various reasons downstream of headaches I should've solved a long time ago, recon-13 and recon-9 tests are skipped in the staggered_dslash_ctest. --compute-fat-long true is indeed included in the ctest commands. The likely solution to this is to begin homogenizing the logic for loading gauge fields and verifying staggered dslash calls between the dslash test and the invert test, where there doesn't seem to be an issue.

weinbe2 avatar Jul 19 '23 13:07 weinbe2

Incremental progress is being made in https://github.com/lattice/quda/tree/hotfix/stag-dslash-test-recon-partition-failure ; no clear resolution yet.

weinbe2 avatar Jul 19 '23 18:07 weinbe2