ltqin

Results 10 comments of ltqin

created [PR626](https://github.com/ROCmSoftwarePlatform/composable_kernel/pull/626)

the example pass with compute-artifactory.amd.com:5000/rocm-plus-docker/lightning/compilercqe-compiler-psdb-amd-stg-open:6203-ubuntu-18.04

@daniellowell The task seems simple, but there are some strange problems in the test. If the task is not urgent, I will finish it by November 15th. Is that ok?

Directly using v_fmac_f32 replaces v_mac_f32,it can be compiled on gfx1030, but the running results can not be verified, and the hip version of fp32 also fails to pass the verification....

> > but the running results can not be verified > > Kernels with inline `v_fmac_f32` fail verification? YES > > the hip version of fp32 also fails... > >...

when set the flag "CK_USE_AMD_BUFFER_ADDRESSING" to zero, the test pass (both with inline v_fmac_f32 and without inline assembly code). does "amdgcn_buffer_load_f32X" not work for gfx1030.

I create a JIRA: http://ontrack-internal.amd.com/browse/SWDEV-253624

> @ltqin Please test if rocm3.9 fixes the issue. If not, please comment on JIRA and ask Mark for a hip-clang package with the fix > > http://ontrack-internal.amd.com/browse/SWDEV-253624?focusedCommentId=6425343&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-6425343 test pass...

@aska-0096 pls review again