[Issue]: Could not load /opt/rocm-6.1.3/lib/rocblas/library/TensileLibrary.dat
Problem Description
rocblaslt error: Could not load /opt/rocm-6.1.3/lib/hipblaslt/library/TensileLibrary.dat
Segmentation fault
i performed cp /opt/rocm-6.1.3/lib/hipblaslt/library/TensileLibrary_gfx1100.dat /opt/rocm-6.1.3/lib/hipblaslt/library/TensileLibrary.dat otherwise i dont know how to get this file.
I ran the ./install.sh -idc --architecture 'gfx1100' --merge-files --static from the hipBLASLt repository
Driver installation via amdgpu-install -y --usecase=wsl,rocm --no-dkms
Operating System
WSL2 Ubuntu 22.04 Windows 11
CPU
7800x3d
GPU
AMD Radeon RX 7900 XT
Other
No response
ROCm Version
ROCm 6.1.3
ROCm Component
hipBLASLt
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
Did you load it from pytorch? You may also need to replace the torch libhipblaslt.so
There was a similar issue raised by @lhl: https://github.com/ROCm/hipBLASLt/issues/831
831 it seems to be not installed with 6.1.3 software there was no official release in the repository
I get this error on linux as well. ubuntu 24.04 7900xtx rocm 6.2
Pointing HIPBLASLT_TENSILE_LIBPATH = hipBLASLt/build/release/Tensile/library causes below error.
rocblaslt error: Cannot read /home/adminl/hipBLASLt/build/release/Tensile/library/TensileLibrary.dat: No such file or directory
rocblaslt error: Could not load /home/adminl/hipBLASLt/build/release/Tensile/library/TensileLibrary.dat Segmentation fault (core dumped)
Hi @unclemusclez. Internal ticket has been created to investigate your issue. Thanks!
Hi @unclemusclez. Internal ticket has been created to investigate your issue. Thanks!
@ppanchad-amd this may be fixable with ln -s of the correlating lazy load *.dat file to .../TensileLibrary.dat
Thank you for looking into this.
I am curious about any progress. At the moment i am exploring multiple GFX platforms, including gfx906 and gfx1100.
Hi @unclemusclez, when is this issue occurring? I can see the place in the source code where this error is emitted, and it looks like it should be picking up TensileLibrary_gfx1100.dat; not sure yet why it isn't so I'll try to reproduce this.
@schung-amd it's been some time since tried to compile ROCm for my Windows WSL machine. I think this might be an issue with bitsandbytes but i don't remember at this point. this issue is from 4 months ago. Perhaps you can not replicate this because it relates to the kernel, of which does not exists on WSL linux.
From my experience, if its not working, I just don't worry about it until there is a new WSL-Windows driver update for ROCm.
ROCm 6.1.2 is nice, but really we need 6.2 on Windows. That will bring everything up to date with the modern capabilities of PyTorch and CUDA Cooperative Groups are supported. The current windows drivers for GPU are not even working correctly. We have to downgrade or shared memory is used by default. It's very difficult to troubleshoot the versions/source/environment of things when I'm actively trying to do work.
I'll follow up with this at some point when i come across it again.
This should be addressed in ROCm 6.2 with lazy loading (https://github.com/ROCm/hipBLASLt/commit/28eb8258d967f3ccaab5aed891bf40d62cdd099d), so hopefully once WSL for 6.2 is released this is fixed.
ROCm 6.1.2 is nice, but really we need 6.2 on Windows. That will bring everything up to date with the modern capabilities of PyTorch and CUDA Cooperative Groups are supported.
Unfortunately we have no plans at this time to add cooperative groups support on Windows.
@sleppyrobot Are you still encountering this error on Ubuntu? If so, can you provide some steps to reproduce it?
Hey no the issue went away when I changed pytorch version.
As far as the steps to reproduce, I was using ComfyUI and a SDXL model with rocm6.2 pytorch 2.5, any from August to early September would trigger the error, also need to link or point to the hipblast library. @schung-amd
Unfortunately we have no plans at this time to add cooperative groups support on Windows.
@schung-amd this is a necessity for a lot of video and 3d AI python applications due to their dependency on https://github.com/graphdeco-inria/diff-gaussian-rasterization
is there anyway to have this reprioritized or looked at? For Unreal/Blender pipelines this would be incredible. It is a major reason why I am considering switching to an entirely Linux platform at the moment. I just don't have the resources or time to switch everything.
Of course, there was ZLUDA.
I've seen other requests for cooperative groups support on Windows and am reaching out internally to push for support if feasible. That being said, I am unaware of the reason we are not supporting it at this time (i.e. there may be technical barriers) and wouldn't expect support in the near future.