Up-to-date benchmarks
Hi, I've been looking at the project and the benchmarks published with great interest.
Considering that OpenBLAS and MKL continues to evolve, I'm wondering if another set of benchmarks using the last versions has already been performed (for instance, MKL is now at version 2021.4, with different implementations).
Another point : there are different hacks that can be used to get a better a behavior when running on AMD cpus. I'm not saying they should be used but for the sake of comparing all BLAS implementations, I'm wondering if the published benchmarks shouldn't be offering the results obtained when the library actually believes it is running on an Intel CPU.
Best, Eloi
The "hack MKL to make it think it is running on Intel HW" for AMD benchmarking has been discussed before (@fgvanzee can you remember where?). We ultimately decided only to benchmark libraries using published, supported modes of operation.
As for the version of libraries tested, we test the most recent version of all libraries at the time that the benchmark is performed. Of course BLIS continues to evolve as well do we re-do benchmarks periodically. If one of them is particularly out of date please let us know and we can think about re-running it.
Thanks for your feedback Devin. I understand the decision made regarding benchmark contents.
I believe it would be interesting to bench again (or report what was not already published) recent x86_64 architectures (broadwell, skylake, zen2, and zen 3) against current Blis, MKL 2021.4 (https://software.intel.com/content/www/us/en/develop/articles/oneapi-math-kernel-library-release-notes.html) and OpenBLAS-0.3.7 (https://www.openblas.net/Changelog.txt).
@egaudry Thanks for your interest in the performance results. Based on internal GitHub stats, they seem to be one of the things that draws many people to our project website. :slightly_smiling_face:
Unfortunately, re-running benchmarks can be somewhat labor intensive. We have to update system software, then update all of the BLAS software, including BLIS, then get the environment set up (i.e., disabling CPU throttling), then test to make sure everything is set up properly to produce good data (running hours-long jobs and then finding out that the data is bad afterward is always a bummer), and then process the data into graphs, then process the graphs for display on GitHub. (We've automated about as much of this process as we can.)
Lately we have been hard at work on new features for BLIS -- things that we think the community will be excited about -- but we absolutely will update those performance numbers in the future when the time is right. In the meantime, thanks for your patience and interest in BLIS.
@fgvanzee fully understood, keep up the good work :).
I'm just adding a comment regarding the last release notes for MKL (oneAPI 2022.1): https://www.intel.com/content/www/us/en/developer/articles/system-requirements/oneapi-math-kernel-library-system-requirements.html
They state they support Intel chips only, with no mention (at all) for others. I might be wrong (please let me know) but I believe it is the first time Intel uses an explicit list such as this one.
I'm adding this comment here as I believe multiple vendors support is a key benefit of BLIS.
I hadn't seen this issue originally. I should have posted full results here before for SKX, in particular, despite the response when I reported on DGEMM originally. Anyhow, there are results which are fair to OpenBLAS (apart from Graviton?) at https://git.sr.ht/~fx/blis/tree/performance/docs/Performance.md. Some of them probably bear repeating, though.
I don't keep up with MKL, and haven't tried the version du jour, but I don't understand the widespread obsession with it, even on AMD hardware and in the face of measurements. Anyway, for what it's worth, just checking the symbols from the latest dpkg, it has the same "_zen" ones as the 2021.1 version.
BLIS fails on "multi-vendor" across current HPC targets because the POWER9 implementation is broken, and dynamic architecture dispatch was lacking last I looked. (OpenBLAS and Eigen also have optimized support for s390x, which is the other main GNU/Linux distribution architecture, albeit not for mainstream HPC. I wasn't convinced by the results I got on s390x with the BLIS reference kernel and analytical block sizes.)
Thanks Dave.
The obsession might be linked to the fact that it has been a goto solution (for different reasons) for years, which in turn means that one tends to compare it to alternatives before even really considering alternatives.
It might also be linked to the fact that workarounds allowing to reach good performance on AMD using MKL are somehow disappearing. I'm not saying that workarounds should have been used in the first place, however they were documented for a few years and used I believe.