PSyclone icon indicating copy to clipboard operation
PSyclone copied to clipboard

PSyclone reordering symbol table during output resulting in compiler error

Open LonelyCat124 opened this issue 2 years ago • 17 comments

When accessing (at least) one file in Socrates PSyclone changes the order of variable declaration, which is resulting in compiler error due to undefined variable.

The file in question is gas_list_pcf.f90, at some point there is a variable defined as: INTEGER, PRIVATE :: i

later in the file during the various reshape calls, this i variable is used.

IN the file output by PSyclone, the outputs are reordered such that some accesses to i are used before its definition:

&public :: threshold_wavelength = RESHAPE([REAL(KIND = RealK) :: 246.0E-09_RealK, 242.0E-09_RealK, 175.0E-09_RealK, &
&134.0E-09_RealK, 129.0E-09_RealK, (0.0_RealK, i = 1, npd_products - 5), 227.5E-09_RealK, 167.1E-09_RealK, 128.6E-09_RealK, &
&108.2E-09_RealK, 89.922E-09_RealK, 65.026E-09_RealK, 63.693E-09_RealK, 54.655E-09_RealK, (0.0_RealK, i = 1, npd_products - 8), &
..... rest of this definition
... string array definition
integer, private :: i

This happens in some other files too, e.g. refract_re_ccf.f90

LonelyCat124 avatar Jun 30 '23 12:06 LonelyCat124

Strictly I think this is still valid Fortran (which compiler is this?) but we do already do the 'right thing' for parameter declarations. Hopefully that approach can be extended to the other declarations.

arporter avatar Jun 30 '23 12:06 arporter

I'm using mpifort, wrapping gcc-12.2 on glados.

LonelyCat124 avatar Jun 30 '23 12:06 LonelyCat124

@sergisiso's suggestion to replace decl x = expr with decl x\nx = expr in PSyIR would also fix this particular case. Determining a valid order will be complicated by the fact that we will have UnknownFortranType's in some cases (including this one) and we will not know their dependencies.

rupertford avatar Jun 30 '23 12:06 rupertford

Ok, I think I tracked this down a bit further and this specific case is due to using a non-parameter decalration in the declaration of a parameter.

A testcase that outputs code that won't compile on godbolt:

def test_fail_case(fortran_reader, fortran_writer):
    code = '''
    module mymod
        integer, parameter :: x = 1
        integer, parameter :: j = 9
        integer, parameter :: k = 12
        integer :: i
        real, parameter :: threshold_Wavelength(j, k) &
        = reshape( [REAL :: &
        1.3, 1.5, 1.6, 1.7, &
        (0.0, i=1, k-4), &
        1.2, 1.4, 1.2, &
        (0.0, i=1, k-3), &
        (0.0, i=1, k), &
        (0.0, i=1, k), &
        (0.0, i=1, k), &
        (0.0, i=1, k), &
        (0.0, i=1, k), &
        (0.0, i=1, k), &
        (0.0, i=1, k)], shape=[j, k] )
    end module
'''
    psyir = fortran_reader.psyir_from_source(code)
    pass_through = fortran_writer(psyir)
    print(pass_through)
    assert False

The output code here is (with me adding line truncation):

module mymod
  implicit none
  integer, parameter, public :: x = 1
  integer, parameter, public :: j = 9
  integer, parameter, public :: k = 12
  real, dimension(j,k), parameter, public :: threshold_wavelength = RESHAPE([REAL :: 1.3, 1.5, 1.6, 1.7, &
   (0.0, i = 1, k - 4), 1.2, 1.4, 1.2, (0.0, i = 1, k - 3), (0.0, i = 1, k), (0.0, i = 1, k), (0.0, i = 1, k),&
   (0.0, i = 1, k), (0.0, i = 1, k), (0.0, i = 1, k), (0.0, i = 1, k)], shape = [j, k])
  integer, public :: i
  public

  contains

end module mymod

gfortran trunk on godbolt sees an error here:

    7 |    (0.0, i = 1, k - 4), 1.2, 1.4, 1.2, (0.0, i = 1, k - 3), (0.0, i = 1, k), (0.0, i = 1, k), (0.0, i = 1, k),&
      |          1
Error: Symbol 'i' at (1) has no IMPLICIT type
Compiler returned: 1

If you move the declaration of i to before threshold_wavelength as the original input does then the output will compile.

@sergisiso I think PSyclone attempting to output all parameters first is the cause of the error for this issue.

LonelyCat124 avatar Jul 04 '23 14:07 LonelyCat124

We assumed that any parameter (compile-time) value could only depend on other parameter values (compile-time) .

But here it uses a runtime (i) variable (albeit with a runtime given value) to set-up a compile-time value(threshold_wavelength) ? Or is the 'i' inside the reshape something specific to shape and not related to the outside 'i'? I am not familiar with RESHAPE

sergisiso avatar Jul 04 '23 14:07 sergisiso

I guess its related because you mention:

If you move the declaration of i to before threshold_wavelength as the original input does then the output will compile.

sergisiso avatar Jul 04 '23 14:07 sergisiso

Victoria (MO) has hit a related problem with our re-ordering of declarations. NEMOVAR has code that defines a type and declares an abstract interface that uses that type. Unfortunately, when we re-generate the declarations, we put the type declaration after the interface declaration and the compiler is unhappy. (Note that the type declaration is of UnknownFortranType because it contains PROCEDURE elements.) I think the only solution to this is to examine the parse trees of the two declarations and check for dependencies?

arporter avatar Jul 04 '23 14:07 arporter

The code in question looks like:

TYPE ftam_typ

  TYPE(ftam_ctl_typ) :: &
     & fctl

  ! reference trajectory

  TYPE(ftrj_typ) :: &
     & ftrj

  ! before state

  TYPE(foce_typ) :: &
     & foce_b

  ! increment

  TYPE(finc_typ) :: &
     & finc_tl, &
     & finc_ad

  ! testing

  INTEGER :: &
     & itst_adj, & !:  test adjoint modules (1), stp_adj (2) or both (3)
     & itst_tan    !:  test tangent modules (1), stp_tan (2) or both (3)

  ! Time stepping

  INTEGER :: &
     & istart, & !: Fisrt and
     & iend      !: last time steps

  PROCEDURE(rst_interface)  , POINTER :: initialiseTL
  PROCEDURE(rst_interface)  , POINTER :: initialiseAD
  PROCEDURE(model_interface), POINTER :: stepTL
  PROCEDURE(model_interface), POINTER :: stepAD
  PROCEDURE(fin_interface)  , POINTER :: finaliseTL
  PROCEDURE(fin_interface)  , POINTER :: finaliseAD

CONTAINS
   PROCEDURE :: setup    => tam_setup
   PROCEDURE :: del      => tam_del
   PROCEDURE :: adj_test => tam_adj_test

END TYPE ftam_typ
ABSTRACT INTERFACE

   SUBROUTINE rst_interface( &
     &  self, &
     &  piom_ctl, &
     &  pvar_ctl, &
     &  pvars, &
     &  pinc, &
     &  poce )

     IMPORT :: fiom_ctl_typ, fvar_ctl_typ, finc_typ, foce_typ, ftam_typ, fvariables_typ

     CLASS(ftam_typ), INTENT(INOUT) :: &
        & self
     ...
 END INTERFACE

arporter avatar Jul 04 '23 14:07 arporter

We assumed that any parameter (compile-time) value could only depend on other parameter values (compile-time) .

But here it uses a runtime (i) variable (albeit with a runtime given value) to set-up a compile-time value(threshold_wavelength) ? Or is the 'i' inside the reshape something specific to shape and not related to the outside 'i'? I am not familiar with RESHAPE

I think what happens is reshape turns the 1D "list of numbers" (I'm unclear if its an array or what it is) in the arguments into the shape'd array defined as the shape argument. This can be used to define an array inline, similar to you could in C with {} syntax.

The use of i then seems to be something to do with how you're allowed to define this list of numbers and I don't know enough about Fortran to understand this in detail.

One possible solution could be to output in a different order:

  1. All variables without values set.
  2. Parameters
  3. Other variables as currently defined.

This would avoid the problem, but I don't know if it would make any others? It wouldn't help Andy/Victoria's issue though.

Mini code:

program x
use, intrinsic :: iso_fortran_env, only : real64
integer, parameter :: npd_products = 3
integer, parameter :: npd_gases = 3
integer :: i

real(real64), dimension(npd_products,npd_gases), parameter &
 :: threshold_wavelength = RESHAPE([REAL(real64) :: 1.5, (0.0, i=1, npd_products-1), &
 1.5, 1.4, 1.3, (0.0, i=1, npd_products)], &
shape = [npd_products, npd_gases])

print *, (threshold_wavelength(1,2))
end program

Creates an array [[1.5, 1.5, 0.0], [0.0, 1.4, 0.0], [0.0, 1.3, 0.0]]

LonelyCat124 avatar Jul 05 '23 13:07 LonelyCat124

I'm unclear if its an array or what it is

I was also confused because arrays are [] or (/ /) but not only parenthesis.

Apparently it is an implied do: https://pages.mtu.edu/~shene/COURSES/cs201/NOTES/chap08/io.html

which acts like a range with the second and third element are the bounds.

And the middle element is even called "data-i-do-object" in the Fortran standard, this is why I wasn't sure if it is related to the variable "i" (but it does, you can change the name but must be declared before)

sergisiso avatar Jul 05 '23 14:07 sergisiso

But what it is really doesn't matter. It should end up in the initial_value as a CodeBlock if not supported.

Indeed we could do your proposed vars without initial_value first or full dependency ordering, both would fix it. But we need #1419 first otherwise it ends as part of a UnknownFortranType.

Once that is implemented we can reason about initial_value of the symbol regardless of it being an UnknownType, CodeBlock or have all info.

sergisiso avatar Jul 05 '23 14:07 sergisiso

Will #1419 fix this case for a full dependency ordering? We will know that the LHS is a Real kind, but since the initial value is still a CodeBlock we won't know what inside it for working out dependencies?

And we could always end up with a (probably) unsolvable case without understanding how to parse this new object, e.g. two arrays defined in this manner where one is used to initialise some values of the other, i.e. inside the second array (array1(i), i=9, 13). Adding an implied do into PSyclone to handle this case might be useful if this is something we expect to see in code for LFRIC? Maybe worth asking MO.

I think handling reshape in general might be worthwhile as I think its the only way to initialise a multidimensional array inline - at least I can't work out how to do it with either [] or (/ syntax.

Edit: To add reshape we probably need:

  1. Add reshape into intrinsic_handler in fparser2.py
  2. Add reshape into IntrinsicCall node, with 2 required and 2 optional arguments. Types of the required arguments i'm not sure, does PSyclone have an equivalent to FParser's Array_Constructor node? I think it doesn't and this was part of the issue from 1419, we'd need some sort of inline array declaration for these arguments.

LonelyCat124 avatar Jul 06 '23 14:07 LonelyCat124

We do because CodeBlock has a "get_symbol_names" methods that return all internal symbols, thanks to all being captured as Fortran2003.Name (unless there is also fparser issues). For the dependency analysis to pick this up we may need to specialise the reference_access method for CodeBlocks but this is an implementation detail. I think we can do dependency analysis with CodeBlocks (we just need to assume always worst case "READWRITE" because we don't understand the semantics of the code in it)

e.g. two arrays defined in this manner where one is used to initialise some values of the other

Ok maybe not two codeblocks with dependencies between them :)

sergisiso avatar Jul 06 '23 14:07 sergisiso

Ah ok, that makes sense. I think the extreme case we can just hope to never come across, it doesn't seem like a very sane use-case.

LonelyCat124 avatar Jul 06 '23 15:07 LonelyCat124

A couple more points related to this:

  • there can be dependencies in the type part of the declaration, so initial_value doesn't solve all problems: real(kind=my_kind), dimension(N), pointer :: array => value

  • The symbol table is an OrderedDict, so we respect the original order of declarations (input code we assume its already correct). The problem is that in the backend we do some grouping of declaration categories, but presumably if we parse 2 unknownTypes, there won't be issues between them because they will be output in-order in the same group.

sergisiso avatar Jul 07 '23 08:07 sergisiso

I think we will still need to re-order in the backend in general. This is because 1) Fortran does not require something to be declared before it is used in another declaration, but some compilers might not support it, 2) someone creating or modifying PSyIR might not add symbols to the symbol table in a 'declare first' order and 3) backends to other languages might have a 'declare first' constraint.

rupertford avatar Jul 07 '23 10:07 rupertford

#1419 is now fixed. It does not solve ordering issues but it means that all initialization expressions are now accessible in symbol.initial_value (even for unknown types) and therefore we have everything to tackle this issue.

sergisiso avatar Jul 27 '23 11:07 sergisiso

An update: I've hit this problem several times in the NAME code base. This has abstract interfaces that use abstract types and if we output the interface before the type then gfortran fails to compile it.

arporter avatar Aug 14 '25 11:08 arporter