PSyclone [psyir] Support allocate/deallocate and random

At this stage, an allocate statement becomes a code block. For the kernel-extraction driver creation, direct support for creating allocate statements would simplify the code (which currently parses a Fortran string to create a codeblock with the allocate statement) is used.

Note that there is also https://github.com/stfc/fparser/issues/298 to support the mold= argument. It would simplify the code further if mold would be supported.

Aug 04 '21 02:08 hiker

This support would also be useful when hoisting local, automatic arrays (#1634). Currently the required allocated() and allocate() calls have to be put into CodeBlocks.

Mar 02 '22 12:03 arporter

I'm thinking of taking this one on as it's needed for #1726. We should also address https://github.com/stfc/fparser/issues/298.

In Fortran we have:

allocate(my_array(x,y,z), [mold=src_array, status=ierr])

and in C we have:

my_array = malloc(x*y*z)

and one checks for success by examining whether my_array is null. We could support the 'mold' concept since in C that would simply be a sizeof(src_array)/sizeof(src_array[0]).

Are there other aspects to consider @hiker, @sergisiso and @rupertford?

Jun 17 '22 10:06 arporter

Naming: Fortran has allocate and deallocate, C has malloc and free where "malloc" really stands for "memory allocation." I favour having nodes Allocate and Deallocate. Although Fortran includes support for specifying the type of the entity (for OO style code), that moves us away from the functionality offered by malloc so we could simply exclude that?

Jun 17 '22 14:06 arporter

Just thought: @LonelyCat124 might have an opinion on this too.

Jun 17 '22 14:06 arporter

Is the plan for these to be a special type of Codeblock node? We should probably prevent code with allocate/deallocate statements from being inside parallel regions unless we have a good reason to allow it? There are certainly algorithms where memory allocation inside parallel regions is a thing, but computing things like shared/private might be not be worth allowing for now.

Allocate/Deallocate nodes seems reasonable to me, my only concern may be if a C++ backend (e.g. Kokkos) or something is planned how we handle that, though mostly I think C++ avoids explicit allocation now.

Does anything special need to be done for this to support GPU outputs? Or do you just allocate locally and leave it to OpenMP/OpenCL/...?

Jun 17 '22 14:06 LonelyCat124

Is the plan for these to be a special type of Codeblock node?

No, I was planning on them being full Statement nodes. We would have to update the various parallel directive classes/transformations to ensure that they are excluded from being inside.

The OO thing is something we've stayed away from so far so I think we're really talking about something that grabs a certain number of bytes, with support for specifying array shape. I haven't thought about Kokkos but maybe @sergisiso can comment?

I hadn't thought about the implications for GPU. I guess the very fact that we recognise allocates would mean that we could then replace them with allocates directly on the GPU. Certainly it would provide more information for the work @nmnobre is doing on explicit data movement.

Jun 17 '22 14:06 arporter

Some ideas:

Do we need them as specific nodes? Maybe being just new operators like the other Fortran intrinsics would be enough for them not being in a codeblock. Is that enough for the psyad use case?
If we need to rationalize about memory usage (e.g. for GPU) I think we need to add an attribute to datatype to mark them as "heap allocated" because the important thing is the symbol info - not the specific statement. (akind to the Fortran allocatable attribute). But maybe both are separate things, we can implement one without the other. If/when we have both we can typecheck that allocations happen with allocatable types.
I agree using the Fortran names (as we use for most concepts). I would not worry about other backends and favor making things simple for Fortran. For other backends we could deal with the syntax change in their backend or simply don't support it (if we only convert to the other language the kernels and let the driver still be Fortran it is sensible to ask for no allocations in kernels so if one is found we request the user to hoist it outside the kernel)

Jun 17 '22 15:06 sergisiso

That sounds good Sergi. There's a slight issue with the stat argument to allocate, e.g.:

allocate(array1(2,2), array2(3,2), stat=ierr)
if (ierr /= 0)then
    ...

Presumably in this case we'd have something like:

alloc1 = NaryOperation.create(
    NaryOperation.Operator.ALLOCATE,
    Reference(array1_sym), [two, two])
Assignment.create(lhs=Reference(ierr1), rhs=alloc1)
alloc2 = NaryOperation.create(
    NaryOperation.Operator.ALLOCATE,
    Reference(array2_sym), [three, two])
Assignment.create(lhs=Reference(ierr2), rhs=alloc2)
Assignment.create(Reference(ierr), SUM(ierr1, ierr2))

We could have status as an optional argument to the create method or potentially have it as the result of the operation (I prefer the latter)? There's also the mold option that @hiker is keen to have.

Jun 17 '22 19:06 arporter

Since mold is just another way to specify the shape (and potentially type) of the thing being allocated it can be expressed in terms of other operators:

arank = RANK(array1)
for i in range(arank):
    a_shape.append(size(array1, i))
NaryOperation.create(NaryOperation.Operator.ALLOCATE,
    Reference(barray), a_shape)

However, that's an internal thing. For our purposes we can support a shape specified as a list of DataNodes or via an ArrayReference.

Jun 17 '22 20:06 arporter

If some existing Fortran has an allocate without a stat argument then we have nothing to assign the result of this proposed operation to. We can't have a bare Operation as a child of Schedule as it's not a Statement. This is the same problem we have with trying to represent the Fortran random_number intrinsic as an operation. We could represent such operations as assigning to the operand:

var = ALLOCATE(size, status)

and

var = RANDOM()

although in the latter case we then have an Operation with no operand unless we do:

var = RANDOM(var)

which is a bit clunky and will cause problems if someone does:

other_var = RANDOM(var)

since in Fortran that would really be call random_number(other_var) with no dependence on var.

Jul 08 '22 10:07 arporter

This problem is also an issue for deallocate(my_var).

Jul 08 '22 10:07 arporter

Following discussion on Teams, we have agreed to move away from "having all intrinsics as operators." To that end we will introduce a new IntrinsicCall node.

Jul 08 '22 13:07 arporter

We have an implementation choice:

Have just one IntrinsicCall class and allow the type of the intrinsic to be specified when it is constructed (c.f. operators);
Have a sub-class of IntrinsicCall for each type of intrinsic, e.g. Allocate, Deallocate etc. Option 1. is quite concise but makes it difficult to validate the supplied arguments. It is also difficult to support named arguments. Option 2. could wind up with a lot of extra classes but allows us to validate arguments easily and support named arguments. The classes that I know we need currently would be: Allocate, Deallocate, Random. Probably there are others?

Jul 08 '22 15:07 arporter

Hi @sergisiso and @rupertford, would you mind taking a look at the commit (https://github.com/stfc/PSyclone/commit/5e04f94f894c19267b116072deabc570df418722) and see what you think? (I've implemented Option 1. but have managed to add a fair amount of argument validation as well.)

Jul 08 '22 16:07 arporter

I like the general direction. I also prefer option 1 rather than 2.

The validation of the supplied arguments that you do inside create is something that at some point we will also want the base class to do when we have the Call and the RoutineSymbol declaration/signature. Since we don't do it yet it is fine to be here for now but keep in mind (TODO?) that we could generalize this. Similarly, I am unsure if the _required_args and _optional_args should be inside the Intrinsic node. Since this is just the Call and they represent the signature, to me they are different things, maybe they belong to the IntrinsicSymbol. But again, since we don't have proper generic signatures in RoutineSymbols it may be a good solution for now.

Jul 08 '22 17:07 sergisiso

We need to decide whether an allocate call should follow the Fortran syntax and allow multiple entities to be declared at once. This makes supporting the specification of a single status variable easy. Perhaps status checking should be incorporated in the node (when it is lowered)?

Oct 07 '22 14:10 arporter

[psyir] Support allocate/deallocate and random_number intrinsics