ltqin
ltqin
created [PR626](https://github.com/ROCmSoftwarePlatform/composable_kernel/pull/626)
the example pass with compute-artifactory.amd.com:5000/rocm-plus-docker/lightning/compilercqe-compiler-psdb-amd-stg-open:6203-ubuntu-18.04
@daniellowell The task seems simple, but there are some strange problems in the test. If the task is not urgent, I will finish it by November 15th. Is that ok?
Directly using v_fmac_f32 replaces v_mac_f32,it can be compiled on gfx1030, but the running results can not be verified, and the hip version of fp32 also fails to pass the verification....
> > but the running results can not be verified > > Kernels with inline `v_fmac_f32` fail verification? YES > > the hip version of fp32 also fails... > >...
@atamazov Okay, I got it
when set the flag "CK_USE_AMD_BUFFER_ADDRESSING" to zero, the test pass (both with inline v_fmac_f32 and without inline assembly code). does "amdgcn_buffer_load_f32X" not work for gfx1030.
I create a JIRA: http://ontrack-internal.amd.com/browse/SWDEV-253624
> @ltqin Please test if rocm3.9 fixes the issue. If not, please comment on JIRA and ask Mark for a hip-clang package with the fix > > http://ontrack-internal.amd.com/browse/SWDEV-253624?focusedCommentId=6425343&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-6425343 test pass...
@aska-0096 pls review again