hipBLASLt
hipBLASLt copied to clipboard
[Issue]: hipBLASLt support for more GPUs for PyTorch with ROCm 5.7 or later
Problem Description
PyTorch now requires hipBLASLt now when building with ROCm 5.7 or later, but hipBLASLt supports only gfx90a
GPUs.
https://github.com/pytorch/pytorch/blob/84b2a323594bc7c4b47d61223b3f6466fe054416/cmake/public/LoadHIP.cmake#L158-L160
Is it means other GPUs (e.g., MI100) can not use PyTorch with the latest ROCm 6.0 release?
Operating System
Ubuntu 22.04.3 LTS
CPU
AMD EPYC 7773X
GPU
AMD Instinct MI100
Other
No response
ROCm Version
ROCm 5.7.1
ROCm Component
Pytorch
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
Currently not supported on MI100.
@Huangxt57 MI100 is not supported. Please see the supported hardware for hipBLASLt (https://github.com/ROCm/hipBLASLt/blob/develop/README.md). Thanks!
I would just like to point out that spotty hardware support like this is precisely why ROCm is viewed by many as half-hearted alternative to CUDA. It would be completely understandable if this ticket were deferred indefinitely due to insufficient engineering resources. Instead, the handling of this ticket completely forecloses any future possibility of broadening support for this increasingly widely-used package.
This sends a very bad message, reinforcing the image that AMD is not serious about providing broad-based hardware support for ROCm. Even worse, the fact that the device in question is the MI100 suggests that AMD isn't even willing to support hardware that they explicitly sold into the compute market.
As someone with input into hardware procurement decisions in an academic research lab, I find it very hard to argue for laying out a good fraction of a research grant on hardware that may be completely unsupported by ROCm ecosystem components in only a couple of years, well before the grant itself has concluded. This sort of behavior needs to change if AMD wants to make greater in-roads into sub-petascale academic research labs.
@xuantengh @bgamari We added 908 supports recently. https://github.com/ROCm/hipBLASLt/commit/938900a10d0c433eb8b90c2a3b65ba70ed39a91b
@xuantengh @bgamari hipblaslt will support gfx908 with ROCm 6.3.
Is it means other GPUs (e.g., MI100) can not use PyTorch with the latest ROCm 6.0 release?
It means functions that needs BLASlt is not usable on some cards, but the rest of pytorch is usable.
E.g. on Gentoo https://github.com/gentoo/gentoo/blob/80b892b0cba5e1695230cd68c79afdfbf3fe102a/sci-libs/hipBLASLt/hipBLASLt-6.1.1-r1.ebuild#L22 can be compiled without specifying any GPU archs, and install a dummy library (https://github.com/gentoo/gentoo/blob/80b892b0cba5e1695230cd68c79afdfbf3fe102a/eclass/rocm.eclass#L251)