Epliz

Results 24 comments of Epliz

You might want to check https://github.com/ROCm-Developer-Tools/HIP/tree/master/samples/2_Cookbook/10_inline_asm , it is explaining things a bit better in my opinion. As for your issue, maybe you can check in generated assembly for 64...

Also, from my experience, marking with volatile prevents some optimisations, and if possible you should remove it.

@ahatstat , for me, what you indicated last, i.e. ``` asm volatile ("V_ADD_CO_U32 %0, %1, %2;" : "=v"(r) : "v"(a), "v"(b)); ``` Works as long as you apply it on...

@ahatstat , I don't have a navi2 gpu to confirm the following works, but at least it is compiling: ``` #include #include #include #include #include #include #define N 100000 #define...

hit the same issue with a MI100 GPU, setting `ROCM_PATH` also helped

Please re-open this ticket. I hit the exact same issue when trying to use tensorflow-rocm 2.11 on the tutorial code at https://www.tensorflow.org/text/tutorials/text_generation. I have a MI100 GPU, running on Ubuntu...

What kind of logs would help you? Strace?

(not an AMD employee) Your first proposal seems like a good idea for hardware that doesn't have instructions for them (some architectures do have instructions for these atomics). The second...

Maybe interesting to @arsenm from a compiler perspective and who can maybe recommend someone to have a look at for the HIP implementation of the atomics?