Patch to find rocm libs on Fedora
The function find_rocm_libs looks for a directory $ROCM_PATH/lib but Fedora installs librocblas.so etc. in /usr/lib64. The patch causes find_rocm_libs to look first for $ROCM_PATH/lib, and if not found to look for $ROCM_PATH. (If neither is found, then the function returns an empty string as it would without the patch.)
This way, by setting
export ROCM_PATH=/usr/lib64
on Fedora, all the ROCM libraries are found and show up in the AMDGPU.versioninfo() table.
I can't get this to work using Fedora container (Apptainer) with following definition
Bootstrap: docker
From: fedora:42
%environment
export JULIA_HOST_DIR=/opt/julia
export ROCM_PATH=/usr/lib64/
%post
dnf -y install rocminfo rccl rocblas rocfft rocsparse rocsolver rocrand roctracer miopen rocm-hip git wget
dnf clean all
# Set variables
JULIA_HOST_DIR=/opt/julia/
DL_URL=https://julialang-s3.julialang.org/bin/linux/x64/1.11/julia-1.11.5-linux-x86_64.tar.gz
TAR_FILE=/var/tmp/julia.tgz
# Julia install
mkdir -p "$JULIA_HOST_DIR"
wget -O "$TAR_FILE" "$DL_URL"
tar -xzf "$TAR_FILE" -C "$JULIA_HOST_DIR"
# Clean up
rm $TAR_FILE
%labels
Description "Fedora Julia plus ROCm container"
%runscript
$JULIA_HOST_DIR/julia-1.11.5/bin/julia $@
I get this
(test) pkg> st
Status `~/Coding/test/Project.toml`
[21141c5a] AMDGPU v1.3.3 `https://github.com/billmclean/AMDGPU.jl.git#fedora_libs`
julia> ENV["ROCM_PATH"]
"/usr/lib64"
julia> using AMDGPU
┌ Warning: HIP library is unavailable, HIP integration will be disabled.
└ @ AMDGPU ~/.julia/packages/AMDGPU/fPa9C/src/AMDGPU.jl:197
┌ Warning: rocBLAS is unavailable, functionality will be disabled.
└ @ AMDGPU ~/.julia/packages/AMDGPU/fPa9C/src/AMDGPU.jl:208
┌ Warning: rocSPARSE is unavailable, functionality will be disabled.
└ @ AMDGPU ~/.julia/packages/AMDGPU/fPa9C/src/AMDGPU.jl:208
┌ Warning: rocSOLVER is unavailable, functionality will be disabled.
└ @ AMDGPU ~/.julia/packages/AMDGPU/fPa9C/src/AMDGPU.jl:208
┌ Warning: rocRAND is unavailable, functionality will be disabled.
└ @ AMDGPU ~/.julia/packages/AMDGPU/fPa9C/src/AMDGPU.jl:208
┌ Warning: rocFFT is unavailable, functionality will be disabled.
└ @ AMDGPU ~/.julia/packages/AMDGPU/fPa9C/src/AMDGPU.jl:208
┌ Warning: MIOpen is unavailable, functionality will be disabled.
└ @ AMDGPU ~/.julia/packages/AMDGPU/fPa9C/src/AMDGPU.jl:208
julia> AMDGPU.functional()
false
But using
julia> begin
AMDGPU.ROCmDiscovery.libhsaruntime = "/usr/lib64/libhsa-runtime64.so.1"
AMDGPU.ROCmDiscovery.lld_path = "/usr/lib64/rocm/llvm/bin/ld.lld"
AMDGPU.ROCmDiscovery.lld_artifact = false
AMDGPU.ROCmDiscovery.libhip = "/usr/lib64/libamdhip64.so.6"
AMDGPU.ROCmDiscovery.libdevice_libs = "/usr/lib64/rocm/llvm/lib/clang/18/amdgcn/bitcode"
AMDGPU.ROCmDiscovery.librocblas = "/usr/lib64/librocblas.so.4"
AMDGPU.ROCmDiscovery.librocsparse = "/usr/lib64/librocsparse.so.1"
AMDGPU.ROCmDiscovery.librocsolver = "/usr/lib64/librocsolver.so.0"
AMDGPU.ROCmDiscovery.librocrand = "/usr/lib64/librocrand.so.1"
AMDGPU.ROCmDiscovery.librocfft = "/usr/lib64/librocfft.so.0"
AMDGPU.ROCmDiscovery.libMIOpen_path = "/usr/lib64/libMIOpen.so.1"
end
"/usr/lib64/libMIOpen.so.1"
julia> AMDGPU.functional()
true
However it has bugs
julia> A = ROCArray(ones(3,3))
3×3 ROCArray{Float64, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
1.0 1.0 1.0
1.0 1.0 1.0
1.0 1.0 1.0
julia> A * A
3×3 ROCArray{Float64, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
3.0 3.0 3.0
3.0 3.0 3.0
3.0 3.0 3.0
julia> sin.(A)
error: Opaque pointers are only supported in -opaque-pointers mode (Producer: 'LLVM18.0.0git' Reader: 'LLVM 16.0.6jl')
@tjjarvinen In your second example where AMDGPU is functional, can you enable artifacts for device libs AMDGPU.ROCmDiscovery.libdevice_libs = "/usr/lib64/rocm/llvm/lib/clang/18/amdgcn/bitcode" only to see if this solves the LLVM version mismatch issue?
If by enabling artefacts you mean setting "AMDGPU.ROCmDiscovery.lld_artifact = true" it has no effect for the result. Fedora ROCm is v6.4.1 that might explain the issues, as Ubuntu with that version also bugs.
julia> using AMDGPU
┌ Warning: HIP library is unavailable, HIP integration will be disabled.
└ @ AMDGPU ~/.julia/packages/AMDGPU/fPa9C/src/AMDGPU.jl:197
┌ Warning: rocBLAS is unavailable, functionality will be disabled.
└ @ AMDGPU ~/.julia/packages/AMDGPU/fPa9C/src/AMDGPU.jl:208
┌ Warning: rocSPARSE is unavailable, functionality will be disabled.
└ @ AMDGPU ~/.julia/packages/AMDGPU/fPa9C/src/AMDGPU.jl:208
┌ Warning: rocSOLVER is unavailable, functionality will be disabled.
└ @ AMDGPU ~/.julia/packages/AMDGPU/fPa9C/src/AMDGPU.jl:208
┌ Warning: rocRAND is unavailable, functionality will be disabled.
└ @ AMDGPU ~/.julia/packages/AMDGPU/fPa9C/src/AMDGPU.jl:208
┌ Warning: rocFFT is unavailable, functionality will be disabled.
└ @ AMDGPU ~/.julia/packages/AMDGPU/fPa9C/src/AMDGPU.jl:208
┌ Warning: MIOpen is unavailable, functionality will be disabled.
└ @ AMDGPU ~/.julia/packages/AMDGPU/fPa9C/src/AMDGPU.jl:208
julia> begin
AMDGPU.ROCmDiscovery.libhsaruntime = "/usr/lib64/libhsa-runtime64.so.1"
AMDGPU.ROCmDiscovery.lld_path = "/usr/lib64/rocm/llvm/bin/ld.lld"
AMDGPU.ROCmDiscovery.lld_artifact = true
AMDGPU.ROCmDiscovery.libhip = "/usr/lib64/libamdhip64.so.6"
AMDGPU.ROCmDiscovery.libdevice_libs = "/usr/lib64/rocm/llvm/lib/clang/18/amdgcn/bitcode"
AMDGPU.ROCmDiscovery.librocblas = "/usr/lib64/librocblas.so.4"
AMDGPU.ROCmDiscovery.librocsparse = "/usr/lib64/librocsparse.so.1"
AMDGPU.ROCmDiscovery.librocsolver = "/usr/lib64/librocsolver.so.0"
AMDGPU.ROCmDiscovery.librocrand = "/usr/lib64/librocrand.so.1"
AMDGPU.ROCmDiscovery.librocfft = "/usr/lib64/librocfft.so.0"
AMDGPU.ROCmDiscovery.libMIOpen_path = "/usr/lib64/libMIOpen.so.1"
end
"/usr/lib64/libMIOpen.so.1"
julia> AMDGPU.functional()
true
julia> A = ROCArray(ones(3,3))
3×3 ROCArray{Float64, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
1.0 1.0 1.0
1.0 1.0 1.0
1.0 1.0 1.0
julia> sin.(A)
error: Opaque pointers are only supported in -opaque-pointers mode (Producer: 'LLVM18.0.0git' Reader: 'LLVM 16.0.6jl')
What if you omit these 2 lines in your manual discovery
AMDGPU.ROCmDiscovery.lld_artifact = true
AMDGPU.ROCmDiscovery.libdevice_libs = "/usr/lib64/rocm/llvm/lib/clang/18/amdgcn/bitcode"
and then report AMDGPU.versioninfo()
That works!
julia> using AMDGPU
┌ Warning: HIP library is unavailable, HIP integration will be disabled.
└ @ AMDGPU ~/.julia/packages/AMDGPU/fPa9C/src/AMDGPU.jl:197
┌ Warning: rocBLAS is unavailable, functionality will be disabled.
└ @ AMDGPU ~/.julia/packages/AMDGPU/fPa9C/src/AMDGPU.jl:208
┌ Warning: rocSPARSE is unavailable, functionality will be disabled.
└ @ AMDGPU ~/.julia/packages/AMDGPU/fPa9C/src/AMDGPU.jl:208
┌ Warning: rocSOLVER is unavailable, functionality will be disabled.
└ @ AMDGPU ~/.julia/packages/AMDGPU/fPa9C/src/AMDGPU.jl:208
┌ Warning: rocRAND is unavailable, functionality will be disabled.
└ @ AMDGPU ~/.julia/packages/AMDGPU/fPa9C/src/AMDGPU.jl:208
┌ Warning: rocFFT is unavailable, functionality will be disabled.
└ @ AMDGPU ~/.julia/packages/AMDGPU/fPa9C/src/AMDGPU.jl:208
┌ Warning: MIOpen is unavailable, functionality will be disabled.
└ @ AMDGPU ~/.julia/packages/AMDGPU/fPa9C/src/AMDGPU.jl:208
julia> begin
AMDGPU.ROCmDiscovery.libhsaruntime = "/usr/lib64/libhsa-runtime64.so.1"
AMDGPU.ROCmDiscovery.lld_path = "/usr/lib64/rocm/llvm/bin/ld.lld"
AMDGPU.ROCmDiscovery.libhip = "/usr/lib64/libamdhip64.so.6"
AMDGPU.ROCmDiscovery.librocblas = "/usr/lib64/librocblas.so.4"
AMDGPU.ROCmDiscovery.librocsparse = "/usr/lib64/librocsparse.so.1"
AMDGPU.ROCmDiscovery.librocsolver = "/usr/lib64/librocsolver.so.0"
AMDGPU.ROCmDiscovery.librocrand = "/usr/lib64/librocrand.so.1"
AMDGPU.ROCmDiscovery.librocfft = "/usr/lib64/librocfft.so.0"
AMDGPU.ROCmDiscovery.libMIOpen_path = "/usr/lib64/libMIOpen.so.1"
end
"/usr/lib64/libMIOpen.so.1"
julia> AMDGPU.versioninfo()
[ Info: AMDGPU versioninfo
┌───────────┬──────────────────┬───────────┬───────────────────────────────────────────────────────────────────────
│ Available │ Name │ Version │ Path ⋯
├───────────┼──────────────────┼───────────┼───────────────────────────────────────────────────────────────────────
│ + │ LLD │ - │ /usr/lib64/rocm/llvm/bin/ld.lld ⋯
│ + │ Device Libraries │ - │ /home/teemu/.julia/artifacts/5ad5ecb46e3c334821f54c1feecc6c152b7b6a4 ⋯
│ + │ HIP │ 6.3.42133 │ /usr/lib64/libamdhip64.so.6 ⋯
│ + │ rocBLAS │ 4.3.0 │ /usr/lib64/librocblas.so.4 ⋯
│ + │ rocSOLVER │ 3.27.0 │ /usr/lib64/librocsolver.so.0 ⋯
│ + │ rocSPARSE │ 3.3.0 │ /usr/lib64/librocsparse.so.1 ⋯
│ + │ rocRAND │ 2.10.5 │ /usr/lib64/librocrand.so.1 ⋯
│ + │ rocFFT │ 1.0.31 │ /usr/lib64/librocfft.so.0 ⋯
│ + │ MIOpen │ 3.3.0 │ /usr/lib64/libMIOpen.so.1 ⋯
└───────────┴──────────────────┴───────────┴───────────────────────────────────────────────────────────────────────
1 column omitted
[ Info: AMDGPU devices
┌────┬───────────────────────┬──────────┬───────────┬────────────┬───────────────┐
│ Id │ Name │ GCN arch │ Wavefront │ Memory │ Shared Memory │
├────┼───────────────────────┼──────────┼───────────┼────────────┼───────────────┤
│ 1 │ AMD Radeon RX 6800 XT │ gfx1030 │ 32 │ 15.984 GiB │ 64.000 KiB │
│ 2 │ AMD Radeon Graphics │ gfx1036 │ 32 │ 30.236 GiB │ 64.000 KiB │
└────┴───────────────────────┴──────────┴───────────┴────────────┴───────────────┘
julia> AMDGPU.functional()
true
julia> A = ROCArray(ones(3,3))
3×3 ROCArray{Float64, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
1.0 1.0 1.0
1.0 1.0 1.0
1.0 1.0 1.0
julia> sin.(A)
3×3 ROCArray{Float64, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
0.841471 0.841471 0.841471
0.841471 0.841471 0.841471
0.841471 0.841471 0.841471
Thanks for this.
You should install the development versions of the Fedora ROCM packages. For instance,rocblas-devel creates a symlink /usr/lib64/librocblas.so that points to
/usr/lib64/librocblas.so.4 (which in turn points to /usr/lib64/librocblas.so.4.3).
This way, on my system I just need
export HSA_OVERRIDE_GFX_VERSION=11.0.0
export ROCM_PATH=/usr/lib64
to get
[ Info: AMDGPU versioninfo
┌───────────┬──────────────────┬───────────┬───────────────────────────────────────────────────────────────────────
│ Available │ Name │ Version │ Path ⋯
├───────────┼──────────────────┼───────────┼───────────────────────────────────────────────────────────────────────
│ + │ LLD │ - │ /home/bill/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/libex ⋯
│ + │ Device Libraries │ - │ /home/bill/.julia/artifacts/5ad5ecb46e3c334821f54c1feecc6c152b7b6a45 ⋯
│ + │ HIP │ 6.3.42133 │ /usr/lib64/libamdhip64.so ⋯
│ + │ rocBLAS │ 4.3.0 │ /usr/lib64/librocblas.so ⋯
│ + │ rocSOLVER │ 3.27.0 │ /usr/lib64/librocsolver.so ⋯
│ + │ rocSPARSE │ 3.3.0 │ /usr/lib64/librocsparse.so ⋯
│ + │ rocRAND │ 2.10.5 │ /usr/lib64/librocrand.so ⋯
│ + │ rocFFT │ 1.0.31 │ /usr/lib64/librocfft.so ⋯
│ + │ MIOpen │ 3.3.0 │ /usr/lib64/libMIOpen.so ⋯
└───────────┴──────────────────┴───────────┴───────────────────────────────────────────────────────────────────────
1 column omitted
[ Info: AMDGPU devices
┌────┬───────────────────────┬──────────┬───────────┬────────────┬───────────────┐
│ Id │ Name │ GCN arch │ Wavefront │ Memory │ Shared Memory │
├────┼───────────────────────┼──────────┼───────────┼────────────┼───────────────┤
│ 1 │ AMD Radeon RX 7700 XT │ gfx1100 │ 32 │ 11.984 GiB │ 64.000 KiB │
│ 2 │ AMD Radeon Graphics │ gfx1100 │ 32 │ 15.252 GiB │ 64.000 KiB │
└────┴───────────────────────┴──────────┴───────────┴────────────┴───────────────┘
Is there an environment variable I could set to get Fedora's LLD?
Yes adding dev packages made it work. Would it be a good idea to add some documentation saying that you need to install dev packages?
Here is the container file that worked:
Bootstrap: docker
From: fedora:42
%environment
export JULIA_HOST_DIR=/opt/julia
export ROCM_PATH=/usr/lib64/
%post
dnf -y install rocminfo rccl rocblas rocfft rocsparse rocsolver rocrand roctracer miopen rocm-hip git wget
dnf -y install rccl-devel rocblas-devel rocfft-devel rocsparse-devel rocsolver-devel rocrand-devel roctracer-devel miopen-devel rocm-hip-devel
dnf clean all
# Set variables
JULIA_HOST_DIR=/opt/julia/
DL_URL=https://julialang-s3.julialang.org/bin/linux/x64/1.11/julia-1.11.5-linux-x86_64.tar.gz
TAR_FILE=/var/tmp/julia.tgz
# Julia install
mkdir -p "$JULIA_HOST_DIR"
wget -O "$TAR_FILE" "$DL_URL"
tar -xzf "$TAR_FILE" -C "$JULIA_HOST_DIR"
# Clean up
rm $TAR_FILE
%labels
Description "Fedora Julia plus ROCm container"
%runscript
$JULIA_HOST_DIR/julia-1.11.5/bin/julia $@
I also run the the test and they are fine apart from multi-GPU tests
[ Info: AMDGPU versioninfo
┌───────────┬──────────────────┬───────────┬───────────────────────────────────────────────────────────────────────
│ Available │ Name │ Version │ Path ⋯
├───────────┼──────────────────┼───────────┼───────────────────────────────────────────────────────────────────────
│ + │ LLD │ - │ /opt/julia/julia-1.11.5/libexec/julia/lld ⋯
│ + │ Device Libraries │ - │ /home/teemu/.julia/artifacts/5ad5ecb46e3c334821f54c1feecc6c152b7b6a4 ⋯
│ + │ HIP │ 6.3.42133 │ /usr/lib64/libamdhip64.so ⋯
│ + │ rocBLAS │ 4.3.0 │ /usr/lib64/librocblas.so ⋯
│ + │ rocSOLVER │ 3.27.0 │ /usr/lib64/librocsolver.so ⋯
│ + │ rocSPARSE │ 3.3.0 │ /usr/lib64/librocsparse.so ⋯
│ + │ rocRAND │ 2.10.5 │ /usr/lib64/librocrand.so ⋯
│ + │ rocFFT │ 1.0.31 │ /usr/lib64/librocfft.so ⋯
│ + │ MIOpen │ 3.3.0 │ /usr/lib64/libMIOpen.so ⋯
└───────────┴──────────────────┴───────────┴───────────────────────────────────────────────────────────────────────
1 column omitted
[ Info: AMDGPU devices
┌────┬───────────────────────┬──────────┬───────────┬────────────┬───────────────┐
│ Id │ Name │ GCN arch │ Wavefront │ Memory │ Shared Memory │
├────┼───────────────────────┼──────────┼───────────┼────────────┼───────────────┤
│ 1 │ AMD Radeon RX 6800 XT │ gfx1030 │ 32 │ 15.984 GiB │ 64.000 KiB │
│ 2 │ AMD Radeon Graphics │ gfx1036 │ 32 │ 30.236 GiB │ 64.000 KiB │
└────┴───────────────────────┴──────────┴───────────┴────────────┴───────────────┘
[ Info: Test suite info
┌─────────┬───────────────────────────────────────────────────────────────┬───────────────────────────────────────────────┐
│ Workers │ Device │ Tests │
├─────────┼───────────────────────────────────────────────────────────────┼───────────────────────────────────────────────┤
│ 4 │ HIPDevice(id=1, name=AMD Radeon RX 6800 XT, gcn_arch=gfx1030) │ core, hip, ext, gpuarrays, kernelabstractions │
└─────────┴───────────────────────────────────────────────────────────────┴───────────────────────────────────────────────┘
[ Tests Completed: 36/36 test items were run.
Test Summary: | Pass Fail Error Broken Total Time
AMDGPU | 13709 3 2 15 13729 11m05.0s
test | 13709 3 2 15 13729
test/core_tests.jl | 614 3 617
core | 614 3 617 1m37.4s
Functional | 2 2 0.1s
HIPDevice | 8 8 0.0s
ISA parsing | 10 10 0.1s
Exception holder | 0 1.7s
Comparison | 3 3 0.0s
Synchronization | 1 1 5.6s
Trapping | 2 2 0.0s
Hardware FP atomics | 1 1 1.0s
Base | 556 2 558 1m15.2s
Specifying buffer type | 4 4 0.1s
Constructor | 2 2 3.9s
ones/zeros | 2 2 0.5s
view | 10 10 1.7s
resize! | 3 3 0.5s
unsafe_wrap | 18 18 9.8s
unsafe_free | 0 0.0s
accumulate | 25 25 8.2s
Atomics | 1 1 0.4s
Sorting | 384 384 40.7s
Reverse kernel | 88 88 2.8s
Multi-GPU | 19 2 21 3.5s
Device switching | 7 7 0.2s
Arrays | 4 1 5 1.3s
Copying | 1 1 0.8s
Kernel | 1 1 2 0.9s
Correctly switching HIP context | 6 6 0.4s
broadcast | 18 18 8.1s
Ref Broadcast | 1 1 0.6s
Broadcast Fix | 2 2 0.8s
Broadcast Ref{<:Type} | 1 1 0.5s
Device | 3 3 0.0s
Stream | 7 7 0.3s
test/device_tests.jl | 489 9 498
test/external_tests.jl | 18 18
test/gpuarrays_tests.jl | 7404 7404
test/hip_core_tests.jl | 4 1 5
hip - core | 4 1 5 11.9s
AMDGPU.@elapsed | 4 4 10.2s
HIP Peer Access | 1 1 0.7s
test/hip_miopen_tests.jl | 1 1
hip - MIOpen | 1 1 0.0s
test/hip_rocblas_tests.jl | 678 678
test/hip_rocfft_tests.jl | 252 252
test/hip_rocrand_tests.jl | 141 141
test/hip_rocsolver_tests.jl | 618 618
test/hip_rocsparse_tests.jl | 1272 1272
test/ka_tests.jl | 2219 6 2225
ERROR: LoadError: Some tests did not pass: 13709 passed, 3 failed, 2 errored, 15 broken.
in expression starting at /home/teemu/.julia/packages/AMDGPU/fPa9C/test/runtests.jl:114
ERROR: Package AMDGPU errored during testing
Yes, you could add a note in docs/src/index.md under Required software. Something like the following:
!!! note "ROCm installation on Fedora" Although not included in the AMD's list of supported Linux distributions, Fedora provides its own ROCM packages.
sudo dnf install rocminfo rccl-devel rocblas-devel rocfft-devel rocsparse-devel rocsolver-devel rocrand-devel roctracer-devel miopen-devel rocm-hip-devel
However, the libraries are not installed in the usual location (under /opt/rocm) so for AMDGPU to find them you must set an environment variable.
export ROCM_PATH=/usr/lib64
I wonder if we should rely on hipconfig more for this. Because it would be nice to not rely on the ROCM_PATH, also the device libs can be put into weird places. I talked a bit with @pxl-th about this a while ago
Superseded by #788