rocBLAS
rocBLAS copied to clipboard
[Bug]: rocblas cannot load TensileLibrary.dat
Describe the bug
Code using rocblas is unable to load TensileLibrary.dat with error:
rocBLAS error: Cannot read
/opt/rocm/rocblas/library/TensileLibrary.dat: Illegal seek
To Reproduce
This occurs when using the RPM packages for rocm 5.2 and newer. Version 5.1 includes this file, but it appears to have been removed in later versions.
Expected behavior
The RPM packages should include TensilLibrary.dat or some way of automatically creating this file?
Environment
Hardware | description |
---|---|
CPU | AMD EPYC 7413 |
GPU | MI-210 |
Software | version |
---|---|
rocm-core | v5.2 |
rocblas | v5.2 |
Attached environment.txt
environment.txt
@G-Ragghianti,
In ROCm 5.2, TensileLibrary.dat file was split into multiple files based on GPU architecture. So, you might find files such as TensileLibrary_gfx906.dat , TensileLibrary_gfx90a.dat
etc. in /opt/rocm/rocblas/library/ path.
To further debug the issue, would it possible for you to run the following command and provide the output.
ldd <user_app>
I am curious to see which version of rocblas does the application use.
Yes, I noticed that TensileLibrary.dat wasn't provided since version 5.2, but the code in rocblas seems to still reference it as a fallback if there is a failure to read the arch-specific dat file. I'm prettry sure that we are building and running only on version 5.2, but I've requested my collaborator to post his ldd output. Thanks.
$ ldd test/tester
linux-vdso.so.1 => (0x00007ffc59365000)
libslate.so => /home/kadir/dopamine/t020/slate-dev/lib/libslate.so (0x00007f4f3b9a2000)
libtestsweeper.so => /home/kadir/dopamine/t020/slate-dev/testsweeper/libtestsweeper.so (0x00007f4f3b792000)
libmkl_scalapack_lp64.so.2 => /nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-oneapi-mkl-2022.0.2-csli2oqmbu6esrsvxrnenbde52w7touz/mkl/2022.0.2/lib/intel64/libmkl_scalapack_lp64.so.2 (0x00007f4f3b065000)
libmkl_blacs_intelmpi_lp64.so.2 => /nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-oneapi-mkl-2022.0.2-csli2oqmbu6esrsvxrnenbde52w7touz/mkl/2022.0.2/lib/intel64/libmkl_blacs_intelmpi_lp64.so.2 (0x00007f4f3ca4b000)
libblaspp.so => /home/kadir/dopamine/t020/slate-dev/blaspp/lib/libblaspp.so (0x00007f4f3aded000)
liblapackpp.so => /home/kadir/dopamine/t020/slate-dev/lapackpp/lib/liblapackpp.so (0x00007f4f3aa94000)
libmkl_gf_lp64.so.2 => /nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-oneapi-mkl-2022.0.2-csli2oqmbu6esrsvxrnenbde52w7touz/mkl/2022.0.2/lib/intel64/libmkl_gf_lp64.so.2 (0x00007f4f39bf6000)
libmkl_sequential.so.2 => /nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-oneapi-mkl-2022.0.2-csli2oqmbu6esrsvxrnenbde52w7touz/mkl/2022.0.2/lib/intel64/libmkl_sequential.so.2 (0x00007f4f381dc000)
libmkl_core.so.2 => /nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-oneapi-mkl-2022.0.2-csli2oqmbu6esrsvxrnenbde52w7touz/mkl/2022.0.2/lib/intel64/libmkl_core.so.2 (0x00007f4f33e27000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f4f33c0b000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f4f33a07000)
librocsolver.so.0 => /opt/rocm/lib/librocsolver.so.0 (0x00007f4f0040b000)
librocblas.so.0 => /opt/rocm/lib/librocblas.so.0 (0x00007f4ef2a07000)
libamdhip64.so.5 => /opt/rocm/lib/libamdhip64.so.5 (0x00007f4ef1b34000)
libmpicxx.so.12 => /nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/lib/libmpicxx.so.12 (0x00007f4ef1914000)
libmpifort.so.12 => /nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/lib/libmpifort.so.12 (0x00007f4ef1555000)
libmpi.so.12 => /nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/lib/release/libmpi.so.12 (0x00007f4ef0339000)
librt.so.1 => /lib64/librt.so.1 (0x00007f4ef0131000)
libstdc++.so.6 => /nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/gcc-7.3.0-z3dl67mmil3qe2z5loqk6l5denetg5x7/lib64/libstdc++.so.6 (0x00007f4eefdaf000)
libm.so.6 => /lib64/libm.so.6 (0x00007f4eefaad000)
libgomp.so.1 => /nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/gcc-7.3.0-z3dl67mmil3qe2z5loqk6l5denetg5x7/lib64/libgomp.so.1 (0x00007f4eef87f000)
libgcc_s.so.1 => /nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/gcc-7.3.0-z3dl67mmil3qe2z5loqk6l5denetg5x7/lib64/libgcc_s.so.1 (0x00007f4eef668000)
libc.so.6 => /lib64/libc.so.6 (0x00007f4eef29a000)
/lib64/ld-linux-x86-64.so.2 (0x00007f4f3c876000)
libamd_comgr.so.2 => /opt/rocm/lib/libamd_comgr.so.2 (0x00007f4ee7bc4000)
libhsa-runtime64.so.1 => /opt/rocm/lib/libhsa-runtime64.so.1 (0x00007f4ee776f000)
libnuma.so.1 => /lib64/libnuma.so.1 (0x00007f4ee7563000)
libfabric.so.1 => /nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/libfabric/lib/libfabric.so.1 (0x00007f4ee7321000)
libz.so.1 => /lib64/libz.so.1 (0x00007f4ee710b000)
libtinfo.so.5 => /lib64/libtinfo.so.5 (0x00007f4ee6ee1000)
libelf.so.1 => /lib64/libelf.so.1 (0x00007f4ee6cc9000)
libdrm.so.2 => /opt/amdgpu/lib64/libdrm.so.2 (0x00007f4f3ca14000)
libdrm_amdgpu.so.1 => /opt/amdgpu/lib64/libdrm_amdgpu.so.1 (0x00007f4f3ca07000)
@KadirAkbudak , Thanks for the output, could you please provide me the output of the following commands:
ls -al /opt
ls -al /opt/rocm/rocblas/library/
ls -al /etc/alternatives/rocm
rocm_agent_enumerator
@G-Ragghianti , By default rocBLAS generates architecture specific TensileLibrary files, but users can override this by using merge-architectures
build option to generate TensileLibrary.dat
file.
Here, I am trying to understand why the library failed to load/find TensileLibrary_gfx90a.dat file.
$ ls -al /opt
total 16
drwxr-xr-x. 7 root root 128 Jun 28 00:58 .
dr-xr-xr-x. 21 root root 4096 Sep 26 02:07 ..
drwxr-xr-x 7 root root 65 Jul 11 17:06 amdgpu
drwx--x--x 4 root root 28 Jun 8 18:55 containerd
-rwxr-xr-x 1 root root 366 Jan 26 2021 knl_mods.sh
drwxr-xr-x. 4 root root 50 Aug 31 2020 nvidia
drwxr-xr-x. 4 root root 46 May 27 2020 rh
lrwxrwxrwx 1 root root 22 Jul 11 17:04 rocm -> /etc/alternatives/rocm
drwxr-xr-x 36 root root 4096 Jun 28 01:14 rocm-5.2.0
-rwxr-xr-x. 1 root root 276 Aug 31 2020 uncore.sh
$ ls -al /opt/rocm/rocblas/library/
ls: cannot access /opt/rocm/rocblas/library/: No such file or directory
$ ls -al /opt/rocm/rocblas/lib
total 0
drwxr-xr-x 3 root root 40 Jul 11 17:06 .
drwxr-xr-x 4 root root 32 Jul 11 17:06 ..
drwxr-xr-x 2 root root 136 Jul 11 17:06 cmake
lrwxrwxrwx 1 root root 23 Jul 11 17:06 librocblas.so -> ../../lib/librocblas.so
$ ls -al /etc/alternatives/rocm
lrwxrwxrwx 1 root root 15 Jul 11 17:04 /etc/alternatives/rocm -> /opt/rocm-5.2.0
$ rocm_agent_enumerator
gfx000
gfx90a
gfx90a
$ ls -al /opt/rocm/lib/rocblas/library/
total 1358704
drwxr-xr-x 2 root root 4096 Jul 11 17:06 .
drwxr-xr-x 3 root root 21 Jul 11 17:06 ..
-rw-r--r-- 1 root root 22368232 Jun 28 00:47 Kernels.so-000-gfx1010.hsaco
-rw-r--r-- 1 root root 21409768 Jun 28 00:47 Kernels.so-000-gfx1012.hsaco
-rw-r--r-- 1 root root 20856808 Jun 28 00:47 Kernels.so-000-gfx1030.hsaco
-rw-r--r-- 1 root root 21716968 Jun 28 00:47 Kernels.so-000-gfx803.hsaco
-rw-r--r-- 1 root root 22536168 Jun 28 00:47 Kernels.so-000-gfx900.hsaco
-rw-r--r-- 1 root root 20475880 Jun 28 00:47 Kernels.so-000-gfx906-xnack-.hsaco
-rw-r--r-- 1 root root 20463592 Jun 28 00:47 Kernels.so-000-gfx908-xnack-.hsaco
-rw-r--r-- 1 root root 20193256 Jun 28 00:47 Kernels.so-000-gfx90a-xnack-.hsaco
-rw-r--r-- 1 root root 20197352 Jun 28 00:47 Kernels.so-000-gfx90a-xnack+.hsaco
-rw-r--r-- 1 root root 130175968 Jun 28 00:47 TensileLibrary_gfx1030.co
-rw-r--r-- 1 root root 30403016 Jun 28 00:47 TensileLibrary_gfx1030.dat
-rw-r--r-- 1 root root 4321536 Jun 28 00:47 TensileLibrary_gfx803.co
-rw-r--r-- 1 root root 5517022 Jun 28 00:47 TensileLibrary_gfx803.dat
-rw-r--r-- 1 root root 53663704 Jun 28 00:47 TensileLibrary_gfx900.co
-rw-r--r-- 1 root root 23703192 Jun 28 00:47 TensileLibrary_gfx900.dat
-rw-r--r-- 1 root root 113151720 Jun 28 00:47 TensileLibrary_gfx906.co
-rw-r--r-- 1 root root 53902785 Jun 28 00:47 TensileLibrary_gfx906.dat
-rw-r--r-- 1 root root 238989720 Jun 28 00:47 TensileLibrary_gfx908.co
-rw-r--r-- 1 root root 67871494 Jun 28 00:47 TensileLibrary_gfx908.dat
-rw-r--r-- 1 root root 346532104 Jun 28 00:47 TensileLibrary_gfx90a.co
-rw-r--r-- 1 root root 132832830 Jun 28 00:46 TensileLibrary_gfx90a.dat
-rw-r--r-- 1 root root 2796 Jun 28 00:21 TensileManifest.txt
$ ls -al /opt/rocm/lib
total 4890972
drwxr-xr-x 7 root root 8192 Jul 11 17:07 .
drwxr-xr-x 36 root root 4096 Jun 28 01:14 ..
drwxr-xr-x 25 root root 4096 Jul 11 17:07 cmake
drwxr-xr-x 3 root root 27 Jul 11 17:04 CMakeFiles
-rw-r--r-- 1 root root 92 Jun 27 23:23 .hipInfo
lrwxrwxrwx 1 root root 17 Jul 11 17:05 libamd_comgr.so -> libamd_comgr.so.2
lrwxrwxrwx 1 root root 25 Jul 11 17:05 libamd_comgr.so.2 -> libamd_comgr.so.2.4.50200
-rwxr-xr-x 1 root root 124241088 Jun 27 23:20 libamd_comgr.so.2.4.50200
lrwxrwxrwx 1 root root 16 Jul 11 17:06 libamdhip64.so -> libamdhip64.so.5
lrwxrwxrwx 1 root root 24 Jul 11 17:06 libamdhip64.so.5 -> libamdhip64.so.5.2.50200
-rwxr-xr-x 1 root root 13438688 Jun 27 23:35 libamdhip64.so.5.2.50200
-rwxr-xr-x 1 root root 1415632 Jun 27 23:26 libamdocl64.so
lrwxrwxrwx 1 root root 15 Jul 11 17:06 libhipblas.so -> libhipblas.so.0
lrwxrwxrwx 1 root root 23 Jul 11 17:06 libhipblas.so.0 -> libhipblas.so.0.1.50200
-rwxr-xr-x 1 root root 489944 Jun 28 00:58 libhipblas.so.0.1.50200
-rwxr-xr-x 1 root root 99640 Jun 28 00:16 libhipfft.so
lrwxrwxrwx 1 root root 15 Jul 11 17:07 libhiprand.so -> libhiprand.so.1
lrwxrwxrwx 1 root root 23 Jul 11 17:05 libhiprand.so.1 -> libhiprand.so.1.1.50200
-rwxr-xr-x 1 root root 16600 Jun 28 00:07 libhiprand.so.1.1.50200
lrwxrwxrwx 1 root root 23 Jul 11 17:06 libhiprtc-builtins.so -> libhiprtc-builtins.so.5
lrwxrwxrwx 1 root root 31 Jul 11 17:06 libhiprtc-builtins.so.5 -> libhiprtc-builtins.so.5.2.50200
-rwxr-xr-x 1 root root 371296 Jun 27 23:34 libhiprtc-builtins.so.5.2.50200
lrwxrwxrwx 1 root root 14 Jul 11 17:06 libhiprtc.so -> libhiprtc.so.5
lrwxrwxrwx 1 root root 22 Jul 11 17:06 libhiprtc.so.5 -> libhiprtc.so.5.2.50200
-rwxr-xr-x 1 root root 585320 Jun 27 23:35 libhiprtc.so.5.2.50200
lrwxrwxrwx 1 root root 17 Jul 11 17:06 libhipsolver.so -> libhipsolver.so.0
lrwxrwxrwx 1 root root 25 Jul 11 17:06 libhipsolver.so.0 -> libhipsolver.so.0.1.50200
-rwxr-xr-x 1 root root 221992 Jun 28 00:57 libhipsolver.so.0.1.50200
lrwxrwxrwx 1 root root 17 Jul 11 17:06 libhipsparse.so -> libhipsparse.so.0
lrwxrwxrwx 1 root root 25 Jul 11 17:06 libhipsparse.so.0 -> libhipsparse.so.0.1.50200
-rwxr-xr-x 1 root root 251216 Jun 28 00:27 libhipsparse.so.0.1.50200
lrwxrwxrwx 1 root root 28 Jul 11 17:07 libhsa-amd-aqlprofile64.so -> libhsa-amd-aqlprofile64.so.1
lrwxrwxrwx 1 root root 36 Jul 11 17:07 libhsa-amd-aqlprofile64.so.1 -> libhsa-amd-aqlprofile64.so.1.0.50200
-rwxr-xr-x 1 root root 330008 Jun 27 23:19 libhsa-amd-aqlprofile64.so.1.0.50200
lrwxrwxrwx 1 root root 21 Jul 11 17:06 libhsa-runtime64.so -> libhsa-runtime64.so.1
lrwxrwxrwx 1 root root 29 Jul 11 17:06 libhsa-runtime64.so.1 -> libhsa-runtime64.so.1.5.50200
-rwxr-xr-x 1 root root 2486088 Jun 27 23:18 libhsa-runtime64.so.1.5.50200
lrwxrwxrwx 1 root root 14 Jul 11 17:07 libMIOpen.so -> libMIOpen.so.1
lrwxrwxrwx 1 root root 22 Jul 11 17:07 libMIOpen.so.1 -> libMIOpen.so.1.0.50200
-rwxr-xr-x 1 root root 369673232 Jun 28 00:59 libMIOpen.so.1.0.50200
lrwxrwxrwx 1 root root 11 Jul 11 17:05 liboam.so -> liboam.so.1
lrwxrwxrwx 1 root root 19 Jul 11 17:05 liboam.so.1 -> liboam.so.1.0.50200
-rwxr-xr-x 1 root root 633584 Jun 27 22:38 liboam.so.1.0.50200
lrwxrwxrwx 1 root root 14 Jul 11 17:07 libOpenCL.so -> libOpenCL.so.1
lrwxrwxrwx 1 root root 16 Jul 11 17:07 libOpenCL.so.1 -> libOpenCL.so.1.2
-rwxr-xr-x 1 root root 32784 Jun 27 23:26 libOpenCL.so.1.2
lrwxrwxrwx 1 root root 12 Jul 11 17:06 librccl.so -> librccl.so.1
lrwxrwxrwx 1 root root 20 Jul 11 17:06 librccl.so.1 -> librccl.so.1.0.50200
-rwxr-xr-x 1 root root 118463128 Jun 28 00:14 librccl.so.1.0.50200
lrwxrwxrwx 1 root root 22 Jul 11 17:06 librocalution_hip.so -> librocalution_hip.so.0
lrwxrwxrwx 1 root root 30 Jul 11 17:06 librocalution_hip.so.0 -> librocalution_hip.so.0.1.50200
-rwxr-xr-x 1 root root 12613440 Jun 28 00:53 librocalution_hip.so.0.1.50200
lrwxrwxrwx 1 root root 18 Jul 11 17:06 librocalution.so -> librocalution.so.0
lrwxrwxrwx 1 root root 26 Jul 11 17:06 librocalution.so.0 -> librocalution.so.0.1.50200
-rwxr-xr-x 1 root root 9776016 Jun 28 00:53 librocalution.so.0.1.50200
lrwxrwxrwx 1 root root 15 Jul 11 17:06 librocblas.so -> librocblas.so.0
lrwxrwxrwx 1 root root 23 Jul 11 17:06 librocblas.so.0 -> librocblas.so.0.1.50200
-rwxr-xr-x 1 root root 231476848 Jun 28 00:47 librocblas.so.0.1.50200
lrwxrwxrwx 1 root root 23 Jul 11 17:07 librocfft-device-0.so -> librocfft-device-0.so.0
lrwxrwxrwx 1 root root 31 Jul 11 17:05 librocfft-device-0.so.0 -> librocfft-device-0.so.0.1.50200
-rwxr-xr-x 1 root root 725093408 Jun 28 00:08 librocfft-device-0.so.0.1.50200
lrwxrwxrwx 1 root root 23 Jul 11 17:07 librocfft-device-1.so -> librocfft-device-1.so.0
lrwxrwxrwx 1 root root 31 Jul 11 17:05 librocfft-device-1.so.0 -> librocfft-device-1.so.0.1.50200
-rwxr-xr-x 1 root root 756631464 Jun 28 00:08 librocfft-device-1.so.0.1.50200
lrwxrwxrwx 1 root root 23 Jul 11 17:07 librocfft-device-2.so -> librocfft-device-2.so.0
lrwxrwxrwx 1 root root 31 Jul 11 17:05 librocfft-device-2.so.0 -> librocfft-device-2.so.0.1.50200
-rwxr-xr-x 1 root root 738903360 Jun 28 00:08 librocfft-device-2.so.0.1.50200
lrwxrwxrwx 1 root root 23 Jul 11 17:07 librocfft-device-3.so -> librocfft-device-3.so.0
lrwxrwxrwx 1 root root 31 Jul 11 17:06 librocfft-device-3.so.0 -> librocfft-device-3.so.0.1.50200
-rwxr-xr-x 1 root root 637009768 Jun 28 00:08 librocfft-device-3.so.0.1.50200
lrwxrwxrwx 1 root root 14 Jul 11 17:07 librocfft.so -> librocfft.so.0
lrwxrwxrwx 1 root root 22 Jul 11 17:06 librocfft.so.0 -> librocfft.so.0.1.50200
-rwxr-xr-x 1 root root 5040944 Jun 28 00:08 librocfft.so.0.1.50200
lrwxrwxrwx 1 root root 17 Jul 11 17:04 librocm-core.so -> librocm-core.so.1
lrwxrwxrwx 1 root root 25 Jul 11 17:04 librocm-core.so.1 -> librocm-core.so.1.0.50200
-rwxr-xr-x 1 root root 7672 Jun 27 23:38 librocm-core.so.1.0.50200
lrwxrwxrwx 1 root root 19 Jul 11 17:05 librocm-dbgapi.so -> librocm-dbgapi.so.0
lrwxrwxrwx 1 root root 24 Jul 11 17:05 librocm-dbgapi.so.0 -> librocm-dbgapi.so.0.65.1
-rwxr-xr-x 1 root root 989760 Jun 27 23:27 librocm-dbgapi.so.0.65.1
lrwxrwxrwx 1 root root 28 Jul 11 17:07 librocm-debug-agent.so.2 -> librocm-debug-agent.so.2.0.3
-rwxr-xr-x 1 root root 186696 Jun 27 23:35 librocm-debug-agent.so.2.0.3
lrwxrwxrwx 1 root root 18 Jul 11 17:05 librocm_smi64.so -> librocm_smi64.so.5
lrwxrwxrwx 1 root root 26 Jul 11 17:05 librocm_smi64.so.5 -> librocm_smi64.so.5.0.50200
-rwxr-xr-x 1 root root 613016 Jun 27 22:38 librocm_smi64.so.5.0.50200
lrwxrwxrwx 1 root root 21 Jul 11 17:07 librocprofiler64.so -> librocprofiler64.so.1
lrwxrwxrwx 1 root root 29 Jul 11 17:07 librocprofiler64.so.1 -> librocprofiler64.so.1.0.50200
-rwxr-xr-x 1 root root 343976 Jun 27 23:19 librocprofiler64.so.1.0.50200
lrwxrwxrwx 1 root root 15 Jul 11 17:07 librocrand.so -> librocrand.so.1
lrwxrwxrwx 1 root root 23 Jul 11 17:05 librocrand.so.1 -> librocrand.so.1.1.50200
-rwxr-xr-x 1 root root 17738608 Jun 28 00:07 librocrand.so.1.1.50200
lrwxrwxrwx 1 root root 17 Jul 11 17:07 librocsolver.so -> librocsolver.so.0
lrwxrwxrwx 1 root root 25 Jul 11 17:05 librocsolver.so.0 -> librocsolver.so.0.1.50200
-rwxr-xr-x 1 root root 863176736 Jun 28 00:53 librocsolver.so.0.1.50200
lrwxrwxrwx 1 root root 17 Jul 11 17:06 librocsparse.so -> librocsparse.so.0
lrwxrwxrwx 1 root root 25 Jul 11 17:06 librocsparse.so.0 -> librocsparse.so.0.1.50200
-rwxr-xr-x 1 root root 375513600 Jun 28 00:11 librocsparse.so.0.1.50200
lrwxrwxrwx 1 root root 19 Jul 11 17:07 libroctracer64.so -> libroctracer64.so.1
lrwxrwxrwx 1 root root 27 Jul 11 17:07 libroctracer64.so.1 -> libroctracer64.so.1.0.50200
-rwxr-xr-x 1 root root 318536 Jun 27 23:36 libroctracer64.so.1.0.50200
lrwxrwxrwx 1 root root 15 Jul 11 17:07 libroctx64.so -> libroctx64.so.1
lrwxrwxrwx 1 root root 23 Jul 11 17:07 libroctx64.so.1 -> libroctx64.so.1.0.50200
-rwxr-xr-x 1 root root 82216 Jun 27 23:36 libroctx64.so.1.0.50200
drwxr-xr-x 3 root root 21 Jul 11 17:06 rocblas
-rw-r--r-- 1 root root 462 Jun 27 23:38 rocmmod
drwxr-xr-x 2 root root 94 Jul 11 17:07 rocprofiler
drwxr-xr-x 2 root root 34 Jul 11 17:07 roctracer
Yes, thanks for the clarification. Kadir also had produced some output from strace
which showed the files it was attempting to open. Maybe this would also be useful for you?
@KadirAkbudak , Thanks for the output, could you please provide me the output of the following commands:
ls -al /opt ls -al /opt/rocm/rocblas/library/ ls -al /etc/alternatives/rocm rocm_agent_enumerator
@G-Ragghianti , By default rocBLAS generates architecture specific TensileLibrary files, but users can override this by using
merge-architectures
build option to generateTensileLibrary.dat
file.Here, I am trying to understand why the library failed to load/find TensileLibrary_gfx90a.dat file.
@KadirAkbudak,
Thanks for the information, Can you confirm if ROCBLAS_TENSILE_LIBPATH
environment variable is set to some path?
echo $ROCBLAS_TENSILE_LIBPATH
ROCBLAS_TENSILE_LIBPATH
is unset in our environment.
Will I set it to /opt/rocm-5.2.0/lib/rocblas/library/
?
@KadirAkbudak ,
Setting the ROCBLAS_TENSILE_LIBPATH
might fix the issue, but it should have worked even otherwise.
- Is the
LD_LIBRARY_PATH
set? Also, provide me the output ofreadelf app -d | grep path
- If possible, please attach output of strace.
My initial test with export ROCBLAS_TENSILE_LIBPATH=/opt/rocm/lib/rocblas/library/
is not giving the error. This is good news. I will repeat the test several times to make sure the error does not appear anymore.
$ echo $LD_LIBRARY_PATH
/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/libfabric/lib:/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/lib/release:/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/lib:/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-oneapi-mkl-2022.0.2-csli2oqmbu6esrsvxrnenbde52w7touz/mkl/2022.0.2/lib/intel64:/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-oneapi-mkl-2022.0.2-csli2oqmbu6esrsvxrnenbde52w7touz/lib:/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/gcc-7.3.0-z3dl67mmil3qe2z5loqk6l5denetg5x7/lib64:/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/gcc-7.3.0-z3dl67mmil3qe2z5loqk6l5denetg5x7/lib:/opt/rocm/lib:/opt/rocm/lib64
$ readelf test/tester -d | grep path
0x000000000000001d (RUNPATH) Library runpath: [/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/gcc-8.3.0-uv3lobm7o75mkoallcbarw6lllpw222e/lib:/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/gcc-8.3.0-uv3lobm7o75mkoallcbarw6lllpw222e/lib64:/home/kadir/t022/slate-dev/lib:/home/kadir/t022/slate-dev/testsweeper:/home/kadir/t022/slate-dev/blaspp/lib:/home/kadir/t022/slate-dev/lapackpp/lib:/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-oneapi-mpi-2021.1.1-aiygi2vu5ocm3mx7zpyvsbbai6kzoet2/mpi/2021.1.1/lib/release:/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-oneapi-mpi-2021.1.1-aiygi2vu5ocm3mx7zpyvsbbai6kzoet2/mpi/2021.1.1/lib]
Thanks for information.
Is the strace output with the ROCBLAS_TENSILE_LIBPATH
environment variable set? If yes, could you please re-run strace without setting the variable?
I was expecting to see logs like below, but in the attached strace output, I do not see any log related to TensileLibrary_gfx90a.dat
access("/opt/rocm-5.2.2/lib/../../Tensile/library", R_OK) = -1 ENOENT (No such file or directory)
access("/opt/rocm-5.2.2/liblibrary", R_OK) = -1 ENOENT (No such file or directory)
access("/opt/rocm-5.2.2/lib/rocblas/library/gfx90a", R_OK) = -1 ENOENT (No such file or directory)
access("/opt/rocm-5.2.2/lib/rocblas/library/TensileLibrary_gfx90a.dat", R_OK) = 0
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fcdeddff000
mprotect(0x7fcdede00000, 8388608, PROT_READ|PROT_WRITE|PROT_EXEC) = 0
@KadirAkbudak yes please drop the -e trace=open or use "access" and reattach a strace log of a run which fails which @rkamd requested. We are assuming this one was successful because you set the env variable ROCBLAS_TENSILE_LIBPATH ? That log appears to show it finding code objects, e.g. open("/opt/rocm-5.2.0/lib/rocblas/library//TensileLibrary_gfx90a.co", O_RDONLY) = 20 but we want to see the path of access failures when it looks for the .dat file? We need to find out why it fails as we don't want you setting env ROCBLAS_TENSILE_LIBPATH as a solution. thanks
The file that I sent was run withOUT setting ROCBLAS_TENSILE_LIBPATH
.
The following output is also withOUT setting ROCBLAS_TENSILE_LIBPATH
:
strace -e trace=access ./tester --origin s --target t,d --ref n --nb 8 --type s,d,c,z --lookahead 1 --transA n,t,c --transB n,t,c --dim 100 --dim 100x50 --dim 50x100 --dim 25x50x75 gemm
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
SLATE version 2022.07.00, id cf8095c
input: ./tester --origin s --target t,d --ref n --nb 8 --type s,d,c,z --lookahead 1 --transA n,t,c --transB n,t,c --dim 100 --dim 100x50 --dim 50x100 --dim 25x50x75 gemm
2022-10-03 08:37:24, MPI size 1, OpenMP threads 8, GPU devices available 2
type origin target gemm go transA transB m n k alpha beta nb p q la error time (s) gflop/s ref time (s) ref gflop/s status
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/libfabric/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-oneapi-mkl-2022.0.2-csli2oqmbu6esrsvxrnenbde52w7touz/mkl/2022.0.2/bin/intel64/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/gcc-7.3.0-z3dl67mmil3qe2z5loqk6l5denetg5x7/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/python-3.9.10-sfmfcn7qbsq4yxzd7donakza3jcpoqmf/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/current/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/environment-modules-4.6.1-oj2biyrcm3kij2ddfnxbm52opqw6n7by/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/sbin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/opt/rocm/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/proc/self/fd", R_OK) = 0
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/libfabric/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-oneapi-mkl-2022.0.2-csli2oqmbu6esrsvxrnenbde52w7touz/mkl/2022.0.2/bin/intel64/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/gcc-7.3.0-z3dl67mmil3qe2z5loqk6l5denetg5x7/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/python-3.9.10-sfmfcn7qbsq4yxzd7donakza3jcpoqmf/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/current/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/environment-modules-4.6.1-oj2biyrcm3kij2ddfnxbm52opqw6n7by/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/sbin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/opt/rocm/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/libfabric/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-oneapi-mkl-2022.0.2-csli2oqmbu6esrsvxrnenbde52w7touz/mkl/2022.0.2/bin/intel64/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/gcc-7.3.0-z3dl67mmil3qe2z5loqk6l5denetg5x7/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/python-3.9.10-sfmfcn7qbsq4yxzd7donakza3jcpoqmf/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/current/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/environment-modules-4.6.1-oj2biyrcm3kij2ddfnxbm52opqw6n7by/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/sbin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/opt/rocm/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/libfabric/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-oneapi-mkl-2022.0.2-csli2oqmbu6esrsvxrnenbde52w7touz/mkl/2022.0.2/bin/intel64/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/gcc-7.3.0-z3dl67mmil3qe2z5loqk6l5denetg5x7/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/python-3.9.10-sfmfcn7qbsq4yxzd7donakza3jcpoqmf/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/current/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/environment-modules-4.6.1-oj2biyrcm3kij2ddfnxbm52opqw6n7by/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/sbin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/opt/rocm/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/libfabric/bin/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/bin/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/bin/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-oneapi-mkl-2022.0.2-csli2oqmbu6esrsvxrnenbde52w7touz/mkl/2022.0.2/bin/intel64/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/gcc-7.3.0-z3dl67mmil3qe2z5loqk6l5denetg5x7/bin/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/python-3.9.10-sfmfcn7qbsq4yxzd7donakza3jcpoqmf/bin/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/current/bin/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/environment-modules-4.6.1-oj2biyrcm3kij2ddfnxbm52opqw6n7by/bin/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/bin/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/bin/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/sbin/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/opt/rocm/bin/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/tmp/comgr-196675/output/a.so", F_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/libfabric/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-oneapi-mkl-2022.0.2-csli2oqmbu6esrsvxrnenbde52w7touz/mkl/2022.0.2/bin/intel64/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/gcc-7.3.0-z3dl67mmil3qe2z5loqk6l5denetg5x7/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/python-3.9.10-sfmfcn7qbsq4yxzd7donakza3jcpoqmf/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/current/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/environment-modules-4.6.1-oj2biyrcm3kij2ddfnxbm52opqw6n7by/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/sbin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/opt/rocm/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/libfabric/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-oneapi-mkl-2022.0.2-csli2oqmbu6esrsvxrnenbde52w7touz/mkl/2022.0.2/bin/intel64/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/gcc-7.3.0-z3dl67mmil3qe2z5loqk6l5denetg5x7/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/python-3.9.10-sfmfcn7qbsq4yxzd7donakza3jcpoqmf/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/current/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/environment-modules-4.6.1-oj2biyrcm3kij2ddfnxbm52opqw6n7by/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/sbin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/opt/rocm/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/libfabric/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-oneapi-mkl-2022.0.2-csli2oqmbu6esrsvxrnenbde52w7touz/mkl/2022.0.2/bin/intel64/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/gcc-7.3.0-z3dl67mmil3qe2z5loqk6l5denetg5x7/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/python-3.9.10-sfmfcn7qbsq4yxzd7donakza3jcpoqmf/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/current/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/environment-modules-4.6.1-oj2biyrcm3kij2ddfnxbm52opqw6n7by/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/sbin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/opt/rocm/bin/ptxas", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/libfabric/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-oneapi-mkl-2022.0.2-csli2oqmbu6esrsvxrnenbde52w7touz/mkl/2022.0.2/bin/intel64/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/gcc-7.3.0-z3dl67mmil3qe2z5loqk6l5denetg5x7/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/python-3.9.10-sfmfcn7qbsq4yxzd7donakza3jcpoqmf/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/current/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/environment-modules-4.6.1-oj2biyrcm3kij2ddfnxbm52opqw6n7by/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/sbin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/opt/rocm/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/libfabric/bin/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/compilers_and_libraries_2020.2.254/linux/mpi/intel64/bin/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-mpi-2019.8.254-mszg6y4geg4wip2awwug3f4v5ztshevf/bin/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/intel-oneapi-mkl-2022.0.2-csli2oqmbu6esrsvxrnenbde52w7touz/mkl/2022.0.2/bin/intel64/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/gcc-7.3.0-z3dl67mmil3qe2z5loqk6l5denetg5x7/bin/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/python-3.9.10-sfmfcn7qbsq4yxzd7donakza3jcpoqmf/bin/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/current/bin/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/environment-modules-4.6.1-oj2biyrcm3kij2ddfnxbm52opqw6n7by/bin/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/bin/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/bin/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/usr/sbin/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/opt/rocm/bin/ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
access("/tmp/comgr-88a1ed/output/a.so", F_OK) = -1 ENOENT (No such file or directory)
s scalpk task auto col notrans notrans 100 100 100 3.1+1.4i 2.7+1.7i 8 1 1 1 1.65e-07 0.0152 0.131 NA NA pass
s scalpk task auto col notrans notrans 100 50 50 3.1+1.4i 2.7+1.7i 8 1 1 1 1.21e-07 0.00629 0.0794 NA NA pass
s scalpk task auto col notrans notrans 50 100 100 3.1+1.4i 2.7+1.7i 8 1 1 1 1.64e-07 0.0185 0.0542 NA NA pass
s scalpk task auto col notrans notrans 25 50 75 3.1+1.4i 2.7+1.7i 8 1 1 1 1.47e-07 0.00428 0.0438 NA NA pass
s scalpk task auto col notrans trans 100 100 100 3.1+1.4i 2.7+1.7i 8 1 1 1 1.83e-07 0.0222 0.0899 NA NA pass
s scalpk task auto col notrans trans 100 50 50 3.1+1.4i 2.7+1.7i 8 1 1 1 1.30e-07 0.00762 0.0656 NA NA pass
s scalpk task auto col notrans trans 50 100 100 3.1+1.4i 2.7+1.7i 8 1 1 1 1.82e-07 0.0160 0.0624 NA NA pass
s scalpk task auto col notrans trans 25 50 75 3.1+1.4i 2.7+1.7i 8 1 1 1 1.56e-07 0.00479 0.0391 NA NA pass
s scalpk task auto col notrans conj 100 100 100 3.1+1.4i 2.7+1.7i 8 1 1 1 1.83e-07 0.0151 0.133 NA NA pass
s scalpk task auto col notrans conj 100 50 50 3.1+1.4i 2.7+1.7i 8 1 1 1 1.25e-07 0.00449 0.111 NA NA pass
s scalpk task auto col notrans conj 50 100 100 3.1+1.4i 2.7+1.7i 8 1 1 1 1.80e-07 0.0113 0.0883 NA NA pass
s scalpk task auto col notrans conj 25 50 75 3.1+1.4i 2.7+1.7i 8 1 1 1 1.38e-07 0.00409 0.0458 NA NA pass
s scalpk task auto col trans notrans 100 100 100 3.1+1.4i 2.7+1.7i 8 1 1 1 1.23e-07 0.0171 0.117 NA NA pass
s scalpk task auto col trans notrans 100 50 50 3.1+1.4i 2.7+1.7i 8 1 1 1 9.15e-08 0.00889 0.0562 NA NA pass
s scalpk task auto col trans notrans 50 100 100 3.1+1.4i 2.7+1.7i 8 1 1 1 1.25e-07 0.0118 0.0845 NA NA pass
s scalpk task auto col trans notrans 25 50 75 3.1+1.4i 2.7+1.7i 8 1 1 1 1.09e-07 0.00319 0.0587 NA NA pass
s scalpk task auto col trans trans 100 100 100 3.1+1.4i 2.7+1.7i 8 1 1 1 1.26e-07 0.0203 0.0983 NA NA pass
s scalpk task auto col trans trans 100 50 50 3.1+1.4i 2.7+1.7i 8 1 1 1 9.55e-08 0.00982 0.0509 NA NA pass
s scalpk task auto col trans trans 50 100 100 3.1+1.4i 2.7+1.7i 8 1 1 1 1.47e-07 0.0120 0.0834 NA NA pass
s scalpk task auto col trans trans 25 50 75 3.1+1.4i 2.7+1.7i 8 1 1 1 1.15e-07 0.00390 0.0481 NA NA pass
s scalpk task auto col trans conj 100 100 100 3.1+1.4i 2.7+1.7i 8 1 1 1 1.33e-07 0.0153 0.131 NA NA pass
s scalpk task auto col trans conj 100 50 50 3.1+1.4i 2.7+1.7i 8 1 1 1 1.02e-07 0.00982 0.0509 NA NA pass
s scalpk task auto col trans conj 50 100 100 3.1+1.4i 2.7+1.7i 8 1 1 1 1.31e-07 0.0116 0.0859 NA NA pass
s scalpk task auto col trans conj 25 50 75 3.1+1.4i 2.7+1.7i 8 1 1 1 9.30e-08 0.00431 0.0435 NA NA pass
s scalpk task auto col conj notrans 100 100 100 3.1+1.4i 2.7+1.7i 8 1 1 1 1.21e-07 0.0147 0.136 NA NA pass
s scalpk task auto col conj notrans 100 50 50 3.1+1.4i 2.7+1.7i 8 1 1 1 1.02e-07 0.00494 0.101 NA NA pass
s scalpk task auto col conj notrans 50 100 100 3.1+1.4i 2.7+1.7i 8 1 1 1 1.30e-07 0.00875 0.114 NA NA pass
s scalpk task auto col conj notrans 25 50 75 3.1+1.4i 2.7+1.7i 8 1 1 1 8.38e-08 0.00492 0.0381 NA NA pass
s scalpk task auto col conj trans 100 100 100 3.1+1.4i 2.7+1.7i 8 1 1 1 1.14e-07 0.0182 0.110 NA NA pass
s scalpk task auto col conj trans 100 50 50 3.1+1.4i 2.7+1.7i 8 1 1 1 9.35e-08 0.00914 0.0547 NA NA pass
s scalpk task auto col conj trans 50 100 100 3.1+1.4i 2.7+1.7i 8 1 1 1 1.62e-07 0.00328 0.305 NA NA pass
s scalpk task auto col conj trans 25 50 75 3.1+1.4i 2.7+1.7i 8 1 1 1 1.02e-07 0.00256 0.0733 NA NA pass
s scalpk task auto col conj conj 100 100 100 3.1+1.4i 2.7+1.7i 8 1 1 1 1.26e-07 0.0195 0.103 NA NA pass
s scalpk task auto col conj conj 100 50 50 3.1+1.4i 2.7+1.7i 8 1 1 1 9.02e-08 0.0115 0.0436 NA NA pass
s scalpk task auto col conj conj 50 100 100 3.1+1.4i 2.7+1.7i 8 1 1 1 1.36e-07 0.0106 0.0939 NA NA pass
s scalpk task auto col conj conj 25 50 75 3.1+1.4i 2.7+1.7i 8 1 1 1 9.10e-08 0.00398 0.0472 NA NA pass
rocBLAS error: Cannot read /opt/rocm/rocblas/library/TensileLibrary.dat: Illegal seek
+++ killed by SIGABRT (core dumped) +++
FAILED : exit code -6
Summary:
- Unable to reproduce the error on any local environment
- Code review and debug logs provided by @KadirAkbudak did not indicate any coding error.
- Provided an environment variable for the user to bypass this issue.
Closing this ticket assuming it is resolved or not relevant anymore. Feel free to open a new issue, if required.
Please re-open this ticket.
I hit the exact same issue when trying to use tensorflow-rocm 2.11 on the tutorial code at https://www.tensorflow.org/text/tutorials/text_generation.
I have a MI100 GPU, running on Ubuntu 22.04.2 LTS, rocm 5.4.3 .
I installed all the packages using amdgpu-install with the usecases rocm and mllib, then complemented by installing the packages miopen-hip-gfx908-120kdb
and miopenkernels-gfx908-120kdb
.
export ROCBLAS_TENSILE_LIBPATH=/opt/rocm/lib/rocblas/library/
fixes that specific problems, but it should work out of the box.
(And I hit some other issue right after...)
@Epliz , Thanks for reporting the issue. Could you please open a new ticket with logs without the env variable set.
What kind of logs would help you? Strace?
Yes, that would help