hcc
hcc copied to clipboard
What constraints can be used in operand list in inline assembler for amd gcn?
For some amd gcn instructions can be used "high" level llvm features like this (some are already defined in such way in hc.hpp like __amdgcn_ds_bpermute):
ulong __amdgcn_mqsad_pk_u16_u8(ulong, uint, ulong) [[hc]] __asm("llvm.amdgcn.mqsad.pk.u16.u8");
But for some I was not able to find llvm "symbol" and from llvm sources seems that there are no such "symbols" for them so I need to use lower level inline assembler:
inline uint __amdgcn_add3_u32(uint a, uint b, uint c) [[hc]]
{
uint r;
__asm("v_add3_u32 %0, %1, %2, %3": "=v"(r): "r"(a), "r"(b), "r"(c));
return r;
}
But I was unable to find any documentation about operand modifiers. I guessed that "v" will mean vector register constraint and "r" is probably same as for other architectures. Where I can get more complete info about inline assembler for amd gcn?
Problem with using "r" is that seems there can be only one operand which uses scalar register.
When I try to compile this:
hc::array_view<int> result(1);
parallel_for_each(hc::extent<1>(1), [=](hc::index<1> i) [[hc]]
{
result[0] = __amdgcn_add3_u32(result[0], 20, 30);
});
I am getting error:
<inline asm>:1:2: error: invalid operand (violates constant bus restrictions)
v_add3_u32 v2, v2, s0, s1
^
note: !srcloc = 124
Generating AMD GCN kernel failed in llc for target: gfx900
So how to express this constraint? How to correctly write inline asm for v_add3 which can get also constant as argument?
The constraint for a scalar register is "s". The limitations in 6.2.1 of https://developer.amd.com/wp-content/resources/Vega_Shader_ISA_28July2017.pdf state that at most one SGPR may be read per VALU instruction.
You should not have to resort to inline asm to get at add3. There is work underway to fix this.
Any update on the recommended way to get at add3?