hipBLASLt icon indicating copy to clipboard operation
hipBLASLt copied to clipboard

Add API hipblasLtIsDeviceSupported() to query if current device is supported

Open jichangjichang opened this issue 1 year ago • 9 comments
trafficstars

jichangjichang avatar Jun 24 '24 02:06 jichangjichang

Why we need this?

KKyang avatar Jun 24 '24 03:06 KKyang

Why we need this? https://github.com/pytorch/pytorch/pull/128753 pytroch team request: pytroch doesn't have a way to check if current device is supported by hipblaslt or not. They don't want to add hardcode to check "gfx90a", "gfx942" in framework side.

jichangjichang avatar Jun 24 '24 05:06 jichangjichang

this pr suffers from the same issue as this one here https://github.com/pytorch/pytorch/pull/128753#discussion_r1649620265 ie it is impossible to get here with a hipblaslt unsupported gpu without the rocm runtime haveing silently (or not) entered a failed state.

IMbackK avatar Jun 24 '24 11:06 IMbackK

@jichangjichang I'd like us to have a discussion about whether this solution is the best way to go, before we merge this PR. It seems there will still be open issues at the framework level after this PR.

jithunnair-amd avatar Jun 25 '24 01:06 jithunnair-amd

@jichangjichang I'd like us to have a discussion about whether this solution is the best way to go, before we merge this PR. It seems there will still be open issues at the framework level after this PR.

@jithunnair-amd Any suggestions?

jichangjichang avatar Jun 25 '24 02:06 jichangjichang

@jichangjichang From offline discussions with @jeffdaily, he had concerns about this being a "hip" API when no "cuda" equivalent exists; suggestion being to make this only a "roc" api in accordance with convention.

jithunnair-amd avatar Jul 01 '24 18:07 jithunnair-amd

@jichangjichang From offline discussions with @jeffdaily, he had concerns about this being a "hip" API when no "cuda" equivalent exists; suggestion being to make this only a "roc" api in accordance with convention.

@jithunnair-amd hipblaslt already has many "none cuda" equivalent APIs. We use "hipblasltExt" prefix for this kinds of APIs. I can rename the query APIs as "hipblasLtExtIsDeviceSupported()". How do you think?

jichangjichang avatar Jul 08 '24 07:07 jichangjichang

@jichangjichang From offline discussions with @jeffdaily, he had concerns about this being a "hip" API when no "cuda" equivalent exists; suggestion being to make this only a "roc" api in accordance with convention.

@jithunnair-amd hipblaslt already has many "none cuda" equivalent APIs. We use "hipblasltExt" prefix for this kinds of APIs. I can rename the query APIs as "hipblasLtExtIsDeviceSupported()". How do you think?

Sounds okay to me if we have precedent. @jeffdaily, can you please let us know your opinion?

jithunnair-amd avatar Jul 08 '24 22:07 jithunnair-amd

I will make an exception for this. hipblaslt doesn't have a public rocblaslt interface or library. We really should be putting such extensions as "roc" APIs only. The HIP interfaces for our libraries are meant to be a 1-1 correspondence with CUDA libraries. Period. We're adding more and more API surface area to the HIP part of hipblaslt that goes against our policies. At least you're putting it behind an "ext" API name to avoid confusion here.

jeffdaily avatar Jul 08 '24 23:07 jeffdaily

No need so far, close first

jichangjichang avatar Nov 11 '24 09:11 jichangjichang