Saikat Banerjee
Saikat Banerjee
Thanks, I have set `OMP_NUM_THREADS=1` (updated the main text). All the cores are being utilized. Here's a screenshot:  Thanks for the compilation note on `CFLAGS`.
I just compiled BLIS without threading and tried with MPI again. It doesn't improve the benchmark. ``` ./configure -p /opt/amd/amd-blis-2.2-4 --enable-cblas --disable-threading zen2 ```
Update: I found that the ~92 GFlops obtained with BLIS + OpenMPI is approximately same as the value obtained with single-core OpenMP.
@devinamatthews > did you link OpenBLAS and MKL the exact same way as BLIS? You could even link generically to libblas.so and set it as a symlink to either library...
I am sorry for the late update @devinamatthews The Makefiles for HPL are attached. Each of them is run separately. I know its redundant, but just for the sake of...
@devinamatthews could you manage to run the HPL benchmark?
> @banskt I am stumped and I do not have time to actually try to run and play around with it. It would be helpful if you could produce ONE...