hipBLASLt icon indicating copy to clipboard operation
hipBLASLt copied to clipboard

[Issue]: hipBLASLt support for more GPUs for PyTorch with ROCm 5.7 or later

Open xuantengh opened this issue 1 year ago • 1 comments

Problem Description

PyTorch now requires hipBLASLt now when building with ROCm 5.7 or later, but hipBLASLt supports only gfx90a GPUs.

https://github.com/pytorch/pytorch/blob/84b2a323594bc7c4b47d61223b3f6466fe054416/cmake/public/LoadHIP.cmake#L158-L160

Is it means other GPUs (e.g., MI100) can not use PyTorch with the latest ROCm 6.0 release?

Operating System

Ubuntu 22.04.3 LTS

CPU

AMD EPYC 7773X

GPU

AMD Instinct MI100

Other

No response

ROCm Version

ROCm 5.7.1

ROCm Component

Pytorch

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

xuantengh avatar Dec 23 '23 04:12 xuantengh

Currently not supported on MI100.

KKyang avatar Mar 09 '24 02:03 KKyang

@Huangxt57 MI100 is not supported. Please see the supported hardware for hipBLASLt (https://github.com/ROCm/hipBLASLt/blob/develop/README.md). Thanks!

ppanchad-amd avatar Jul 09 '24 15:07 ppanchad-amd

I would just like to point out that spotty hardware support like this is precisely why ROCm is viewed by many as half-hearted alternative to CUDA. It would be completely understandable if this ticket were deferred indefinitely due to insufficient engineering resources. Instead, the handling of this ticket completely forecloses any future possibility of broadening support for this increasingly widely-used package.

This sends a very bad message, reinforcing the image that AMD is not serious about providing broad-based hardware support for ROCm. Even worse, the fact that the device in question is the MI100 suggests that AMD isn't even willing to support hardware that they explicitly sold into the compute market.

As someone with input into hardware procurement decisions in an academic research lab, I find it very hard to argue for laying out a good fraction of a research grant on hardware that may be completely unsupported by ROCm ecosystem components in only a couple of years, well before the grant itself has concluded. This sort of behavior needs to change if AMD wants to make greater in-roads into sub-petascale academic research labs.

bgamari avatar Aug 15 '24 14:08 bgamari

@xuantengh @bgamari We added 908 supports recently. https://github.com/ROCm/hipBLASLt/commit/938900a10d0c433eb8b90c2a3b65ba70ed39a91b

KKyang avatar Aug 16 '24 03:08 KKyang

@xuantengh @bgamari hipblaslt will support gfx908 with ROCm 6.3.

jichangjichang avatar Aug 16 '24 07:08 jichangjichang

Is it means other GPUs (e.g., MI100) can not use PyTorch with the latest ROCm 6.0 release?

It means functions that needs BLASlt is not usable on some cards, but the rest of pytorch is usable.

E.g. on Gentoo https://github.com/gentoo/gentoo/blob/80b892b0cba5e1695230cd68c79afdfbf3fe102a/sci-libs/hipBLASLt/hipBLASLt-6.1.1-r1.ebuild#L22 can be compiled without specifying any GPU archs, and install a dummy library (https://github.com/gentoo/gentoo/blob/80b892b0cba5e1695230cd68c79afdfbf3fe102a/eclass/rocm.eclass#L251)

littlewu2508 avatar Sep 10 '24 13:09 littlewu2508