Peter Boyle
Peter Boyle
yes - hip is believed working but not efficient for AMD GPUs You might try Benchmark_dwf_fp32 and the --dslash-unroll flag ; new of a few days ago.
Status of multi-GPU and "nvlink" equivalent is untested. --enable-shm=none and MPI between GPU's is probably safer.
BTW, I *have* benchmarked AMD MI50 and MI100, but want to revisit with the new explicit Nc=3 kernel. I have also compiled under HIP on Summit for Nvidia, and got...
Thanks - haven't tried WilsonClover on GPU to be honest, so not absolutely sure if tit works on Nvidia either. Re. the plaquette - this does work on CUDA, so...
HIP is definitely in the "experimental" category for now, but getting everything to work would be good. Glad to see you are running on rocm.3.9 which is recent/up to date.
I should have asked what specifically is the hardware you are running on, rather than physically where is it is located.
can you tell me the performance you get with benchmarks/Benchmark_dwf_fp32 --grid 16.16.16.16 and benchmarks/Benchmark_dwf_fp32 --grid 16.16.16.16 --dslash-unroll Thanks
Thanks. My hypothesis that the --dslash-unroll might fix the performance issues is not correct then. Glad to hear it re. Clover - it's a HIP / CUDA difference, and not...
Could you either A) run it under a debugger (gdb) and trap the fault and ask it for a back trace with "bt". OR B) go to: Grid/qcd/action/fermion/implementation/WilsonCloverFermionImplementation.h 1) uncomment...
though the AMD node I had access to, the rocm debugger didn't work for me.