OpenBLAS icon indicating copy to clipboard operation
OpenBLAS copied to clipboard

OpenBLAS on Graviton2 (NeoverseN1) markedly slower than ARM libarmpl

Open martin-frbg opened this issue 4 years ago • 12 comments

link to benchmark results copied from #3251: https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-compiler-for-linux/resources/tutorials/benchmarks current implementation in OpenBLAS is a mix of generic ARMV8 and ThunderX2T99 (initial PR #2465) Graviton2 is/was in our Travis CI setup but that is currently stranded on the discontinued travis-ci.org

martin-frbg avatar Jul 03 '21 15:07 martin-frbg

@martin-frbg Would you be amenable to a another CI vendor (E.g. cirrus CI) to enable CI on Graviton2? It doesn't look like the drone.io build has successfully run in a few months.

AGSaidi avatar Oct 12 '21 18:10 AGSaidi

Neoverse builds are on Travis, which is working again since about the time drone.io integration failed. (I still have to rely on xianyi for CI and similar "administrative" issues, and he only pops up at irregular intervals). I have been running a few benchmarks on the side in the past couple of days to improve GEMM P/Q parameters but I do not plan to try my hand at dedicated kernels right now.

martin-frbg avatar Oct 12 '21 18:10 martin-frbg

@martin-frbg Regarding the OpenBLAS slower performance... were you using the CI script (eg: .drone.yaml) for building OpenBLAS? I see NUM_THREADS=32 being used in those scripts, this would limit the openBLAS scaling to only 32 threads even on 64core host. In order make use of all 64 cores on Graviton2 16xl, you need to compile it with either NUM_THREADS=64 or do the native build without 'NUM_THREADS' argument so that make system will pickup the host core count correctly.

snadampal avatar Oct 12 '21 19:10 snadampal

The notion of "slower performance" comes from a tangentially related earlier issue - https://github.com/xianyi/OpenBLAS/issues/3251#issuecomment-849940830 quoting a marketing page for ArmPL with a comparison to some unspecified version of OpenBLAS. I had only copied it from that ticket when I closed it, in order to get back to it later. As far as I can tell, the Graviton2 instance provided by travis.com is limited to 4 cores in any case, but so far my quick benchmarks do not appear to be obviously affected by what else may be running on the same node.

martin-frbg avatar Oct 12 '21 20:10 martin-frbg

martin-frbg@ if access to a 64 core system would help, please let me know.

AGSaidi avatar Oct 12 '21 20:10 AGSaidi

Neoverse builds are on Travis, which is working again since about the time drone.io integration failed

@martin-frbg, I'm not able to access Travis Ci build. Is it down agian?

snadampal avatar Oct 14 '21 22:10 snadampal

Not sure what you mean with "not able to access", don't you get the build logs for previous commits or does it fail to run in your projects ? It was working for me 7 hours ago and I see no indication that it failed since

martin-frbg avatar Oct 15 '21 05:10 martin-frbg

I tried to access it from the home page, got 404 error. https://www.travis-ci.com/xianyi/OpenBLAS#:~:text=404,builder%2C%20try%20again!

snadampal avatar Oct 15 '21 12:10 snadampal

Strange - I get that now as well, but I still see PR jobs running on Travis and can access their logs. (On the other hand I am much less optimistic now about getting sane benchmark results from them)

martin-frbg avatar Oct 15 '21 13:10 martin-frbg

@martin-frbg , the homepage Travis ci link is still broken. How are you accessing it to check the PR jobs and logs?

snadampal avatar Nov 02 '21 04:11 snadampal

@snadampal on the pull request page - the Travis badge in the README only tracks completed commits to the develop branch anyway (and no idea why it is unreliable lately - probably something on their end)

martin-frbg avatar Nov 02 '21 10:11 martin-frbg

What is the theoretical peak GFLOPS of Graviton2?

ProgrammerWLY avatar Dec 05 '21 08:12 ProgrammerWLY