PSyclone
PSyclone copied to clipboard
Atomic needs extra clauses
The form of expression for which !$acc atomic
is valid depends upon the clause given:
This means that, for instance
x(i) = a
is only valid for an atomic write
.
At the moment, we do not support any clauses on the atomic directive and thus we generate code that is rejected by the compiler.
We currently also only have the ACCAtomicDirectiveNode
- there is no associated transformation. That could be added as part of this issue unless @sergisiso already has something in hand?
I don't have any transformations, I started experimenting on the lowering adding them when needed, but I think the Transformation may be a better approach.
This means that, for instance x(i) = a is only valid for an atomic write.
Ah interesting, it's the same in openmp but the cases I was doing had the read and write in the same statement, e.g.:
x(i) = x(i) + a
and did not require any additional clauses.
But tbh, I don't fully understand what the "write" alone does here, presumably we have to fence the operation from the read to the write, otherwise we don't solve the race condition, so we need the "atomic update" (which is the no clause default) or "atomic capture"?
But tbh, I don't fully understand what the "write" alone does here, presumably we have to fence the operation from the read to the write, otherwise we don't solve the race condition, so we need the "atomic update" (which is the no clause default) or "atomic capture"?
I think I've got myself confused. I went a bit gung ho in adding 'atomic' to all kernels but actually, I should only be adding it to those that do GH_INC. I'll refine my optimisation script...
And you only need to add atomic to those gh_inc that are on function spaces that are continuous in the vertical (assuming you are keeping colouring in the horizontal, otherwise ignore me).
Thanks Rupert. I've ditched colouring entirely. We're now only 2x slower than the CPU :-)
But tbh, I don't fully understand what the "write" alone does here, presumably we have to fence the operation from the read to the write, otherwise we don't solve the race condition, so we need the "atomic update" (which is the no clause default) or "atomic capture"?
Atomic write would be for a statement such as x = max(x, val)
or similar I expect, whether we come across that kind of thing in LFRic or the other codes I don't know.
max is still supported by the default atomic (update), and PSyclone supports it.
$!omp atomic
x = max(x, val)
The write is just for x=val
for "Ensuring that value is stored atomically into x. No part of x can change until after the entire write operation has completed". Same for atomic read. But that is the case anyway for x86/gpu intrinsic values. Maybe is for other kind of architectures? Or for non intrinsic types (although the standard says 'l-value expressions with scalar type')? Anyway, I don't think we need support for non-default atomics.