PSyclone Atomic needs extra clauses

The form of expression for which !$acc atomic is valid depends upon the clause given: This means that, for instance x(i) = a is only valid for an atomic write.

At the moment, we do not support any clauses on the atomic directive and thus we generate code that is rejected by the compiler.

Nov 14 '23 16:11 arporter

We currently also only have the ACCAtomicDirectiveNode - there is no associated transformation. That could be added as part of this issue unless @sergisiso already has something in hand?

Nov 14 '23 16:11 arporter

I don't have any transformations, I started experimenting on the lowering adding them when needed, but I think the Transformation may be a better approach.

This means that, for instance x(i) = a is only valid for an atomic write.

Ah interesting, it's the same in openmp but the cases I was doing had the read and write in the same statement, e.g.: x(i) = x(i) + a and did not require any additional clauses.

But tbh, I don't fully understand what the "write" alone does here, presumably we have to fence the operation from the read to the write, otherwise we don't solve the race condition, so we need the "atomic update" (which is the no clause default) or "atomic capture"?

Nov 14 '23 20:11 sergisiso

But tbh, I don't fully understand what the "write" alone does here, presumably we have to fence the operation from the read to the write, otherwise we don't solve the race condition, so we need the "atomic update" (which is the no clause default) or "atomic capture"?

I think I've got myself confused. I went a bit gung ho in adding 'atomic' to all kernels but actually, I should only be adding it to those that do GH_INC. I'll refine my optimisation script...

Nov 15 '23 09:11 arporter

And you only need to add atomic to those gh_inc that are on function spaces that are continuous in the vertical (assuming you are keeping colouring in the horizontal, otherwise ignore me).

Nov 15 '23 10:11 rupertford

Thanks Rupert. I've ditched colouring entirely. We're now only 2x slower than the CPU :-)

Nov 15 '23 10:11 arporter

But tbh, I don't fully understand what the "write" alone does here, presumably we have to fence the operation from the read to the write, otherwise we don't solve the race condition, so we need the "atomic update" (which is the no clause default) or "atomic capture"?

Atomic write would be for a statement such as x = max(x, val) or similar I expect, whether we come across that kind of thing in LFRic or the other codes I don't know.

May 03 '24 14:05 LonelyCat124

max is still supported by the default atomic (update), and PSyclone supports it.

$!omp atomic
x = max(x, val)

The write is just for x=val for "Ensuring that value is stored atomically into x. No part of x can change until after the entire write operation has completed". Same for atomic read. But that is the case anyway for x86/gpu intrinsic values. Maybe is for other kind of architectures? Or for non intrinsic types (although the standard says 'l-value expressions with scalar type')? Anyway, I don't think we need support for non-default atomics.

May 07 '24 09:05 sergisiso

PSyclone PSyclone copied to clipboard

Atomic needs extra clauses

PSyclone
PSyclone copied to clipboard