hipBLASLt
hipBLASLt copied to clipboard
Add API hipblasLtIsDeviceSupported() to query if current device is supported
Why we need this?
Why we need this? https://github.com/pytorch/pytorch/pull/128753 pytroch team request: pytroch doesn't have a way to check if current device is supported by hipblaslt or not. They don't want to add hardcode to check "gfx90a", "gfx942" in framework side.
this pr suffers from the same issue as this one here https://github.com/pytorch/pytorch/pull/128753#discussion_r1649620265 ie it is impossible to get here with a hipblaslt unsupported gpu without the rocm runtime haveing silently (or not) entered a failed state.
@jichangjichang I'd like us to have a discussion about whether this solution is the best way to go, before we merge this PR. It seems there will still be open issues at the framework level after this PR.
@jichangjichang I'd like us to have a discussion about whether this solution is the best way to go, before we merge this PR. It seems there will still be open issues at the framework level after this PR.
@jithunnair-amd Any suggestions?
@jichangjichang From offline discussions with @jeffdaily, he had concerns about this being a "hip" API when no "cuda" equivalent exists; suggestion being to make this only a "roc" api in accordance with convention.
@jichangjichang From offline discussions with @jeffdaily, he had concerns about this being a "hip" API when no "cuda" equivalent exists; suggestion being to make this only a "roc" api in accordance with convention.
@jithunnair-amd hipblaslt already has many "none cuda" equivalent APIs. We use "hipblasltExt" prefix for this kinds of APIs. I can rename the query APIs as "hipblasLtExtIsDeviceSupported()". How do you think?
@jichangjichang From offline discussions with @jeffdaily, he had concerns about this being a "hip" API when no "cuda" equivalent exists; suggestion being to make this only a "roc" api in accordance with convention.
@jithunnair-amd hipblaslt already has many "none cuda" equivalent APIs. We use "hipblasltExt" prefix for this kinds of APIs. I can rename the query APIs as "hipblasLtExtIsDeviceSupported()". How do you think?
Sounds okay to me if we have precedent. @jeffdaily, can you please let us know your opinion?
I will make an exception for this. hipblaslt doesn't have a public rocblaslt interface or library. We really should be putting such extensions as "roc" APIs only. The HIP interfaces for our libraries are meant to be a 1-1 correspondence with CUDA libraries. Period. We're adding more and more API surface area to the HIP part of hipblaslt that goes against our policies. At least you're putting it behind an "ext" API name to avoid confusion here.
No need so far, close first