rocBLAS icon indicating copy to clipboard operation
rocBLAS copied to clipboard

fix(tensile_host): fix solutions for gfx103x not able to load

Open wfjsw opened this issue 1 year ago • 3 comments

This patch possibly will fix the problem where the added map broke gfx1031-gfx1035, causing any Tensile solutions for these archs unable to load, forcing them to drop to fallback.

Related log:

ProblemMap Searching for Contraction_l_Alik_Bljk_Cijk_Dijk found Problem library (1 rows)
Object key: 768, 77, 768
Key: 768, 77, 768
Starting point: 17179869184, 1, 2937652110784
Rightward search...
Leftward search...

129, 129, 65: 905234 < 1.79769e+308 <-- Best distance, but no matching solution
129, 129, 65: 905234 == 905234
129, 129, 65: 905234 == 905234
129, 129, 65: 905234 == 905234
129, 129, 65: 905234 == 905234
129, 129, 64: 906641 > 905234
129, 129, 64: 906641 > 905234

......

Considered 100% of entries.
Solution index selected: 69
Running kernel: Cijk_Alik_Bljk_HHS_BH_MT64x32x8_SN_AF0EM1_AMAS2_ASEM1_BL1_BS1_EPS0_FL0_GLVWA2_GLVWB1_GRVW2_GSU1_GSUASB_ISA000_IU1_K1_KLS_LPB0_LDL1_LRVW2_MMFSC_NLCA1_NLCB1_PGR0_PLR1_RK0_SIA1_SU32_SUM0_SUS256_SVW4_TT4_2_USFGROn1_VAW1_VSn1_VW2_VWB2_WS64_WG16_16_1_WGM8

Could you please backport this to HIP SDK 6.1.2 for Windows if possible?

wfjsw avatar Jul 23 '24 15:07 wfjsw

@wfjsw My understanding is that your change will cause lazy loading for gfx1031, gfx1032, gfx1034, gfx1035 assembly kernels listed in .yaml files in the directory https://github.com/ROCm/rocBLAS/tree/develop/library/src/blas3/Tensile/Logic/asm_full . If you search for the strings gfx1031, gfx1032, gfx1034, gfx1035 in this directory you will not find any matches, so these strings are not in getLazyLoadingArch. When assembly kernels are added for an architecture, the architecture is added to getLazyLoadingArch.

Can you let us know the intention of your PR:

  1. Are you trying to lazy load assembly kernels for gfx1031, gfx1032, gfx1034, gfx1035?
  2. Are you trying to build rocBLAS for gfx1031, gfx1032, gfx1034, gfx1035?

amcamd avatar Jul 25 '24 21:07 amcamd

I currently have assembly kernels for these cards, but the stock rocblas.dll refuses to load them when they are placed in search path as it was in 5.7.1, due to this list being added since 6.0.

Also this does seem to affect non-lazyloading as well. Testing appears the non-lazy libraries are also not applied.

wfjsw avatar Jul 26 '24 09:07 wfjsw

rocm sdk builder bug 180 seems to be not related to this bug.

rocBLAS 6.1.2 libraries build by ROCM sdk builder, did not work for gfx906 target on all apps while they worked just fine for rdna1/2/3 gpus. Problem is somehow related to code object version V5 with gfx906 and may be somekind of miss handling of xnack feature on those cards. Some apps were failing and complained about missing kernel symbols while they could be grepped from the rocBLAS co-files. When I use DTensile_CODE_OBJECT_VERSION=default instead of DTensile_CODE_OBJECT_VERSION=V5,then the problem goes away.

lamikr avatar Dec 18 '24 01:12 lamikr

Imported to ROCm/rocm-libraries

ammallya avatar Jun 30 '25 18:06 ammallya