[AOCL-BLAS] Add version 5.1.0 as an initial attempt
I am trying to build AOCL-BLAS (a customized version of BLIS from AMD).
~~Update: After some trial and errors on my local machine, it seems that the _64 suffix is finally attached to the function names in the generated output for Linux build. However, the Windows build fails with the patches for _64 suffix for reasons that's not obvious to me.~~
~~Update2: Ideally I would wish to have someone who understands libblastramponline to take a look at the way how the _64 suffix is hanlded as I cannot tell for sure whether my hack would work with libblastramponline.~~
Update: After some local tests, I believe that the patches for _64 suffix is unnecessary and a misunderstanding of what the libblastramponline README says. I followed what was done with blis_jll.jl earlier and suspect the _64_ suffix appended there was also unnecessary.
You can build this for every target, and maybe we should just so the package is downloadable everywhere (not sure about riscv but aarch64 for sure)
You can build this for every target, and maybe we should just so the package is downloadable everywhere (not sure about riscv but aarch64 for sure)
I believe the JLL package should still be downloadable/installable on all the platforms, but it just will return false from is_available if you try to actually use it on a platform that doesn't have any artifacts built for it. We could use that in a higher level package, like an AOCLBLAS.jl package that would setup the auto-forwarding in LBT, etc.
I think it would be better to build a folder structure of: A/AOCL/AOCL_BLAS, because we eventually will also get the LAPACK libraries, and others, so then they would all live under A/AOCL.
I think it would be better to build a folder structure of:
A/AOCL/AOCL_BLAS, because we eventually will also get the LAPACK libraries, and others, so then they would all live underA/AOCL.
That makes sense. I also wish to get AOCL_LAPACK work but that will require AOCL_BLAS as a dependency so started with the BLAS first. Let me rename the outer folder to AOCL.
@giordano Somehow buildkite is not rebuilding the artifacts? I guess I am otherwise done with the recipe unless new issue is captured.
https://github.com/JuliaPackaging/Yggdrasil/pull/11896#discussion_r2288784912. See https://github.com/JuliaPackaging/Yggdrasil/blob/78cffa5f0f85ce86a7cc72512162df210646e1b7/CONTRIBUTING.md#understanding-build-cache-on-yggdrasil for the explanation. Basically if you edit only the common.jl file, you also need to edit the build_tarballs.jl (e.g. by increasing the build trigger) to actually rebuild.
You can build this for every target, and maybe we should just so the package is downloadable everywhere (not sure about riscv but aarch64 for sure)
I specified the "amdzen" option to build the AMD customized version of BLIS. From what I understood, the other configuration options are just directly coming from the original BLIS. Hence, I am leaving these non-AMD options to blis_jll, assuming that nobody would bother to get the AMD-Zen version to run on Macs.
#11896 (comment). See https://github.com/JuliaPackaging/Yggdrasil/blob/78cffa5f0f85ce86a7cc72512162df210646e1b7/CONTRIBUTING.md#understanding-build-cache-on-yggdrasil for the explanation. Basically if you edit only the
common.jlfile, you also need to edit thebuild_tarballs.jl(e.g. by increasing the build trigger) to actually rebuild.
@giordano Thank you! I wasn't aware that it's already explained there.
Would you also think it's fine to set preferred_gcc_version=v"14" as how it is now? This v"14" requirement seems to be vary rarely used. But, if I didn't misunderstand it, I thought the Zen 5 configuration (part of the amdzen specification currently used) would require GCC v14? (The AMD user guide only asks for GCC 12.2. But I see in the configure folder they have ifelse branching that seems to require GCC 14 for Zen 5.) Does it have any implication on the value for julia_compat?
I specified the "amdzen" option to build the AMD customized version of BLIS. From what I understood, the other configuration options are just directly coming from the original BLIS. Hence, I am leaving these non-AMD options to blis_jll, assuming that nobody would bother to get the AMD-Zen version to run on Macs.
In my opinion, it is perfectly fine to not build the other platforms here, and as I said in my previous comment, that won't cause any problems for users.
Would you also think it's fine to set preferred_gcc_version=v"14" as how it is now? This v"14" requirement seems to be vary rarely used. But, if I didn't misunderstand it, I thought the Zen 5 configuration (part of the amdzen specification currently used) would require GCC v14? (The AMD user guide only asks for GCC 12.2. But I see in the configure folder they have ifelse branching that seems to require GCC 14 for Zen 5.) Does it have any implication on the value for julia_compat?
The preferred_gcc_version won't affect the julia_compat at all, but it will cause it to link against a newer libstdc++, which would limit compatibility with older operating systems. It's hard to say exactly how much that would be a problem, though. While we generally prefer to compile with the oldest possible compiler version to improve that compatibility, sometimes it can be fine to use newer ones.
Looking at the logic here: https://github.com/amd/blis/blob/16f852a065e76e824d77bc39e2baa82ac19ed419/config/zen5/make_defs.mk#L80, there is some fallback logic to still try and compile the Zen5 kernels on older compilers, but it is just that GCC 14 adds the native support for znver5. I'm not sure how much performance for the Zen4/5 kernels we would be missing if we dropped back to say GCC 11 (which is znver3), but since they are assembly kernels probably, I don't think we would lose very much?
@imciner2 Thank you for the explanations! I just witched to preferred_gcc_version=v"12", which is still quite new but hopefully "old enough".
The remaining issue would mostly be about getting the Windows version works. After patching the _64 suffix, the linking process fails for Windows. I am not familiar with how that should be fixed and hence will leave it here and hopefully someone would have some insights on what to do with that.
Update: After some trial and errors on my local machine, it seems that the _64 suffix is finally attached to the function names in the generated output for Linux build. However, the Windows build fails with the patches for _64 suffix for reasons that's not obvious to me.
So, I think we should be putting the trailing underscore on all of these for the Linux builds. That forms part of the Fortran calling convention technically (Linux compilers generally append an _ at the end of Fortran symbols, and Windows compilers do not). That is slightly complicated then because we probably shouldn't need that on Windows.
I'll have to dive into their build system a bit more tomorrow to make sense of what they are doing, because they have some logic already in there for doing this (at least in the CMake build system), but it doesn't fully work.
@giordano @imciner2 I think this recipe is ready for final review (at least from my aspect things are done).
- There is no need to add
_64_suffix in BLAS at all. That was a misunderstanding. - I have tried the Linux IPL64 version locally on my own machines. The Julia tests for BLAS succeeded for
Float32/Float64but there are a few errors forComplexF32/ComplexF64. I think that's just how it is. The patchedget_num_threadsandset_num_threadsboth work. So, from my side, things seem to be working fine now.
@giordano @imciner2 The recipe should be ready now. Could you please take a look for the merge? Thank you!
@imciner2 Sorry to bother you again. But is there any chance that this PR can be merged soon?
Any chance for the PR to get merged so that things can move on with the next AOCL components?
I believe @imciner2 wanted to have a chance to have another look, maybe didn't have the time for it yet (but a gentle bump is good).
@giordano Thank you for the reply. Just wanted to make sure this PR wasn't forgotten, as people suddenly became quite (even though I thought things are already working).
Yea, @junyuan-chen I need to think a bit more about the library naming and symbol naming. I am actually starting to think we shouldn't be renaming the actual 32-bit library (we don't do that for OpenBLAS actually, so I think it is just a vestigial thing that BLIS has kept), and I want to do some experiments to see if we can correctly load both the 64-bit and 32-bit symbols into LBT correctly.
Yea, @junyuan-chen I need to think a bit more about the library naming and symbol naming. I am actually starting to think we shouldn't be renaming the actual 32-bit library (we don't do that for OpenBLAS actually, so I think it is just a vestigial thing that BLIS has kept), and I want to do some experiments to see if we can correctly load both the 64-bit and 32-bit symbols into LBT correctly.
Sounds good. In case it's relevant, AOCL-LAPACK may call BLAS with special internal API (introduced by AMD instead of BLIS) when it is built with AOCL-BLAS instead of standard BLAS symbols. So, this seems to suggest that libblastrampoline is relevant in use cases outside of AOCL-LAPACK. But when using AOCL-LAPACK, the internal symbals from AOCL-BLAS still need to work. I imagine this could make things more complicated if we want to keep both the 64-bit and 32-bit symbols together in one package.
What do we need to do to get this merged?
I think we want to go with https://github.com/amd/AOCL_jll.jl instead.
@ViralBShah Just a short while after this PR, both AOCL_jll.jl and AOCL.jl become available directly from AMD for Linux with ILP64 interface, covering the BLAS and LAPACK functions. They are not registered with Julia General registry though. Considering that someone from AMD is now aware of the need of such packages for Julia, I guess this PR would serve as a reference point without being merged.
I think it is perhaps better to see this one through and have a source build - since the other one is repackaging precompiled binaries. The big thing that would need to be confirmed is that these built from source binaries have the same performance.
@junyuan-chen you can also take a look at https://github.com/aelobdog/AOCL_BIY
This repo is a fork of the amd one, with a GitHub actions thing set up to build the binaries (you can get source build instructions from the workflow script)
The main advantage here is that:
- we can configure it to build blas, lapack and utils (dep for lapack) into a single binary (one for lp64, one for ilp64)
- it comes with a cmake preset called "ga" which means it must be the same config as the one used to build the release binaries that AOCL_JLL repackages.
Of course we'd have to run a benchmark to check performance!