hipBLASLt icon indicating copy to clipboard operation
hipBLASLt copied to clipboard

Allows different CuCount Arch can share kernel object

Open jichangjichang opened this issue 1 year ago • 2 comments

This PR is for reducing hipblaslt library size, which merges different CU code-objects as one. Then same kernel won't be duplicated into different CU code objects. But it may increase the code object loading time.

09/20 library size diff for with/wo this PR without: 12800M with: 9980M

jichangjichang avatar Aug 05 '24 04:08 jichangjichang

Can't work cause Henry hardcoded the CU count into the asm kernels with fork parameters.

This PR only merged the code object for exact same kernels between 80 cu and 304cu. It won't impact the kernel selection design, which mean 80cu gfx942 can find 304 cu solution even without this PR.

jichangjichang avatar Aug 05 '24 07:08 jichangjichang

need to wait until ActivationFunCall is enabled for 80cu HHS/BBS.

jichangjichang avatar Aug 05 '24 13:08 jichangjichang

Closing the pull request in this repo. Please refer to the migrated pull request for updates.

jayhawk-commits avatar Jun 20 '25 18:06 jayhawk-commits