OpenBLAS
OpenBLAS copied to clipboard
OpenBLAS on Graviton2 (NeoverseN1) markedly slower than ARM libarmpl
link to benchmark results copied from #3251: https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-compiler-for-linux/resources/tutorials/benchmarks current implementation in OpenBLAS is a mix of generic ARMV8 and ThunderX2T99 (initial PR #2465) Graviton2 is/was in our Travis CI setup but that is currently stranded on the discontinued travis-ci.org
@martin-frbg Would you be amenable to a another CI vendor (E.g. cirrus CI) to enable CI on Graviton2? It doesn't look like the drone.io build has successfully run in a few months.
Neoverse builds are on Travis, which is working again since about the time drone.io integration failed. (I still have to rely on xianyi for CI and similar "administrative" issues, and he only pops up at irregular intervals). I have been running a few benchmarks on the side in the past couple of days to improve GEMM P/Q parameters but I do not plan to try my hand at dedicated kernels right now.
@martin-frbg Regarding the OpenBLAS slower performance... were you using the CI script (eg: .drone.yaml) for building OpenBLAS? I see NUM_THREADS=32 being used in those scripts, this would limit the openBLAS scaling to only 32 threads even on 64core host. In order make use of all 64 cores on Graviton2 16xl, you need to compile it with either NUM_THREADS=64 or do the native build without 'NUM_THREADS' argument so that make system will pickup the host core count correctly.
The notion of "slower performance" comes from a tangentially related earlier issue - https://github.com/xianyi/OpenBLAS/issues/3251#issuecomment-849940830 quoting a marketing page for ArmPL with a comparison to some unspecified version of OpenBLAS. I had only copied it from that ticket when I closed it, in order to get back to it later. As far as I can tell, the Graviton2 instance provided by travis.com is limited to 4 cores in any case, but so far my quick benchmarks do not appear to be obviously affected by what else may be running on the same node.
martin-frbg@ if access to a 64 core system would help, please let me know.
Neoverse builds are on Travis, which is working again since about the time drone.io integration failed
@martin-frbg, I'm not able to access Travis Ci build. Is it down agian?
Not sure what you mean with "not able to access", don't you get the build logs for previous commits or does it fail to run in your projects ? It was working for me 7 hours ago and I see no indication that it failed since
I tried to access it from the home page, got 404 error. https://www.travis-ci.com/xianyi/OpenBLAS#:~:text=404,builder%2C%20try%20again!
Strange - I get that now as well, but I still see PR jobs running on Travis and can access their logs. (On the other hand I am much less optimistic now about getting sane benchmark results from them)
@martin-frbg , the homepage Travis ci link is still broken. How are you accessing it to check the PR jobs and logs?
@snadampal on the pull request page - the Travis badge in the README only tracks completed commits to the develop branch anyway (and no idea why it is unreliable lately - probably something on their end)
What is the theoretical peak GFLOPS of Graviton2?