Valentin Churavy
Valentin Churavy
That would be great!
I will note that you are using floating point atomics and the OP uses integer atomics. There was an recommendation from LUMI to try out unsafe-atomics. E.g. https://reviews.llvm.org/D91546?id=305522
Okay so on LUMI the C++ code is already doing something quite different from Julia even without `unsafe-fp-atomics` Can you get the LLVM IR from hipcc on LUMI with something...
That should be okay. I think you might also need `-c` since you don't want to link.
You don't want `--shared -fIC` since your `-o` should be a `.ll`
So we are just emitting `atomicrmw fadd double addrspace(1)* %73, double %51 seq_cst, align 8, !dbg !374` So the issue might be that we emit `seq_cst` instead of acq_rel
But now you are missing `-emit-llvm -S --offload-device-only -c`?
Can you also send me the one withtout `unsafe-fp-atomics`, but this is already promising: ``` %41 = load i32, i32 addrspace(1)* %23, align 4, !tbaa !11, !amdgpu.noclobber !5 %42 =...
Oh wow that is even more interesting. ``` %41 = atomicrmw fadd double addrspace(1)* %36, double %29 syncscope("agent-one-as") monotonic, align 8 %42 = atomicrmw fadd double addrspace(1)* %38, double %29...
This is consistent with `Base.@atomic`: ``` help?> @atomic @atomic var @atomic order ex Mark var or ex as being performed atomically, if ex is a supported expression. If no order...