hdf5 icon indicating copy to clipboard operation
hdf5 copied to clipboard

Compilation failure on aarch64 with Arm Compiler for Linux's flang

Open dslarm opened this issue 1 year ago • 5 comments

Using Spack to set up config and compile, the following error occurs: 5 errors found in build log.

It makes no difference if I disable the non-standard float16 - using an arg to cmake of: -DHDF5_ENABLE_NONSTANDARD_FEATURE_FLOAT16:BOOL=OFF

5 errors found in build log:
     3047    F90-S-0081-Illegal selector - KIND parameter has unknown value for data type  (/tmp/user/spack-stage/s
             pack-stage-hdf5-1.14.5-rzhstpbozkrcypkkfn5ntevpg2gwfof7/spack-build-rzhstpb/fortran/static/H5_gen.F90: 604
             9)
     3048      0 inform,   0 warnings,   1 severes, 0 fatal for h5pget_kind_16
     3049    F90-S-0081-Illegal selector - KIND parameter has unknown value for data type  (/tmp/user/spack-stage/s
             pack-stage-hdf5-1.14.5-rzhstpbozkrcypkkfn5ntevpg2gwfof7/spack-build-rzhstpb/fortran/static/H5_gen.F90: 609
             7)
     3050      0 inform,   0 warnings,   1 severes, 0 fatal for h5pregister_kind_16
     3051    F90-S-0081-Illegal selector - KIND parameter has unknown value for data type  (/tmp/user/spack-stage/s
             pack-stage-hdf5-1.14.5-rzhstpbozkrcypkkfn5ntevpg2gwfof7/spack-build-rzhstpb/fortran/static/H5_gen.F90: 614
             5)
     3052      0 inform,   0 warnings,   1 severes, 0 fatal for h5pinsert_kind_16
  >> 3053    make[2]: *** [fortran/src/CMakeFiles/hdf5_fortran-static.dir/build.make:325: fortran/src/CMakeFiles/hdf5_f
             ortran-static.dir/__/static/H5_gen.F90.o] Error 1
     3054    make[2]: Leaving directory '/tmp/user/spack-stage/spack-stage-hdf5-1.14.5-rzhstpbozkrcypkkfn5ntevpg2gw
             fof7/spack-build-rzhstpb'

This is using Arm Compiler for Linux - which is LLVM derivative but the NVIDIA/PGI fortran Flang.

As the float16 is disabled, it ought not to be trying to generate it?

dslarm avatar Nov 12 '24 18:11 dslarm

I see the same. At configure time I see:

-- Performing Test FORTRAN_CHAR_ALLOC - Success
-- ....NUMBER OF INTEGER KINDS FOUND 4
-- ....REAL KINDS FOUND {4,8,16}
-- ....INTEGER KINDS FOUND {1,2,4,8}
-- ....MAX DECIMAL PRECISION 15
-- ....LOGICAL KINDS FOUND {1,2,4,8}
-- ....FOUND SIZEOF for REAL KINDs {4,8,8}

A simple test tells me that armflang does not support REAL KIND=16, so I'm not sure how REAL KINDS FOUND contains 16. Additionally FOUND SIZEOF for REAL KINDs {4,8,8} is suspicious, since the last two kinds have the same sizeof.

The test seems to happen here: https://github.com/HDFGroup/hdf5/blob/f0cffc90ac0f28818c054db5c467a3cdf2574f58/config/cmake/HDF5UseFortran.cmake#L295

lrbison avatar Dec 16 '24 22:12 lrbison

There is a follow-up check here which seems to try and catch the problem:

https://github.com/HDFGroup/hdf5/blob/f0cffc90ac0f28818c054db5c467a3cdf2574f58/config/cmake/HDF5UseFortran.cmake#L405-L412

lrbison avatar Dec 16 '24 22:12 lrbison

Compiling with HDF5 1.12.0 works, and has the following configure output:

-- Performing Test FC_AVAIL_KINDS_RESULT - Success
-- ....NUMBER OF INTEGER KINDS FOUND 4
-- ....REAL KINDS FOUND {4,8}
-- ....INTEGER KINDS FOUND {1,2,4,8}
-- ....MAX DECIMAL PRECISION 15

So it seems the detection logic changed. Best guess would be https://github.com/HDFGroup/hdf5/commit/9b5d9680af8401528bb8c3b6d2b4c3cf30ccec5b

lrbison avatar Dec 17 '24 14:12 lrbison

I truly don't understand how REAL KINDS FOUND includes kind=16 for armflang...

tmp$ cat test_fortran_r16.f90
PROGRAM main
  USE ISO_C_BINDING
  USE, INTRINSIC :: ISO_FORTRAN_ENV, ONLY : stdout=>OUTPUT_UNIT
  IMPLICIT NONE
  REAL (KIND=16) a
  WRITE(stdout, '(I0)') SIZEOF(a)
END
tmp$ armflang test_fortran_r16.f90
F90-S-0081-Illegal selector - KIND parameter has unknown value for data type  (test_fortran_r16.f90: 5)
  0 inform,   0 warnings,   1 severes, 0 fatal for main
tmp$ echo $?
1

lrbison avatar Dec 17 '24 14:12 lrbison

@derobins could you shed any light on the desired configury path here?

lrbison avatar Dec 17 '24 14:12 lrbison

In the spack recipes for hdf5, the AMD compiler team have added this (see: https://github.com/spack/spack/pull/47123)

+        # AOCC does not support _Float16
+        if spec.satisfies("@1.14.4: %aocc"):
+            args.append(self.define("HDF5_ENABLE_NONSTANDARD_FEATURE_FLOAT16", False))

@lrbison - this might help workaround this.

I know I encountered this same issue some months ago, but clean forgot what I was doing at the time so what workaround I used back then.

Actually, that patch is for _float16.. kind=16 is something different - 128b floats.. doh. perhaps a similar fix is available.

dslarm avatar Mar 11 '25 13:03 dslarm

in the CMake file for this..

# dnl The output from the above program will be:
# dnl    -- LINE 1 --  valid integer kinds (comma separated list)
# dnl    -- LINE 2 --  valid real kinds (comma separated list)
# dnl    -- LINE 3 --  max decimal precision for reals
# dnl    -- LINE 4 --  number of valid integer kinds
# dnl    -- LINE 5 --  number of valid real kinds
# dnl    -- LINE 6 --  number of valid logical kinds
# dnl    -- LINE 7 --  valid logical kinds (comma separated list)

the source code at that point generates this output:

1,2,4,8
4,8,16
15
4
3
4
1,2,4,8

which suggests the source - which compiles without error and is:

PROGRAM FC08_AVAIL_KINDS
      USE, INTRINSIC :: ISO_FORTRAN_ENV, ONLY : stdout=>OUTPUT_UNIT, integer_kinds, real_kinds, logical_kinds
      IMPLICIT NONE
      INTEGER :: ik, jk, k, max_decimal_prec
      INTEGER :: num_rkinds, num_ikinds, num_lkinds

      ! Find integer KINDs

      num_ikinds = SIZE(integer_kinds)

      DO k = 1, num_ikinds
         WRITE(stdout,'(I0)', ADVANCE='NO') integer_kinds(k)
         IF(k.NE.num_ikinds)THEN
            WRITE(stdout,'(A)',ADVANCE='NO') ','
         ELSE
            WRITE(stdout,'()')
         ENDIF
      ENDDO

      ! Find real KINDs

      num_rkinds = SIZE(real_kinds)

      max_decimal_prec = 1

      prec: DO ik = 2, 36
         exp: DO jk = 1, 700
            k = SELECTED_REAL_KIND(ik,jk)
            IF(k.LT.0) EXIT exp
            max_decimal_prec = ik
         ENDDO exp
      ENDDO prec

      DO k = 1, num_rkinds
         WRITE(stdout,'(I0)', ADVANCE='NO') real_kinds(k)
         IF(k.NE.num_rkinds)THEN
            WRITE(stdout,'(A)',ADVANCE='NO') ','
         ELSE
            WRITE(stdout,'()')
         ENDIF
      ENDDO

     WRITE(stdout,'(I0)') max_decimal_prec
     WRITE(stdout,'(I0)') num_ikinds
     WRITE(stdout,'(I0)') num_rkinds

     ! Find logical KINDs

     num_lkinds = SIZE(logical_kinds)
     WRITE(stdout,'(I0)') num_lkinds

     DO k = 1, num_lkinds
        WRITE(stdout,'(I0)', ADVANCE='NO') logical_kinds(k)
        IF(k.NE.num_lkinds)THEN
           WRITE(stdout,'(A)',ADVANCE='NO') ','
        ELSE
           WRITE(stdout,'()')
        ENDIF
     ENDDO

END PROGRAM FC08_AVAIL_KINDS

is not getting correct output.

@hyoklee - are you able to reproduce this - do you have access to Arm Compiler for Linux?

dslarm avatar Mar 11 '25 14:03 dslarm

What is the output? Depending on how old your compiler is, it could be this issue:

https://github.com/llvm/llvm-project/issues/77282

If the program does not work, then the compiler has a bug.

brtnfld avatar Mar 11 '25 15:03 brtnfld

Thanks @brtnfld - I don't have the F90 chops myself to know what ought to have been the output.

ACfL is built using the old PGI open sourced F90 compiler, as used by NVHPC and AOCC too - but everyone has been working towards flang-new. Arm hasn't migrated to that yet (but has made its flang-new beta available). So, whilst ACfL is new (24.10.1 ) and that's also based on recent LLVM - it's not yet the LLVM flang-new..

Can you suggest a workaround CMake define / flag that would resolve this? I can definitely make a dirty patch into the hdf5 spack recipe that would only apply that with ACfL if there is one.

FWIW :

with ACfL:

program test2
    use iso_fortran_env
    integer :: i
    do i = 1, size(real_kinds)
        print *, real_kinds(i)
    end do
end program test2

yields: 4, 8, 16.

With the new beta:

ubuntu@ip-10-11-32-85:~$ ./atfl/bin/flang ./test2.f90 
ubuntu@ip-10-11-32-85:~$ ./a.out 
 2
 3
 4
 8
 16

FWIW, AMD's AOCC yields 4, 8, 16. I'm not sure if that's still based on the old fortran front end, or if that's correct for that platform.

dslarm avatar Mar 11 '25 16:03 dslarm

The output for the second case looks correct to me; it detects the low-precision reals. Could you reiterate what is failing?

brtnfld avatar Mar 11 '25 16:03 brtnfld

The failure is that KIND=16 is not supported by ACfL, but that hdf5 thinks it is.

dslarm avatar Mar 11 '25 16:03 dslarm

The failure is that KIND=16 is not supported by ACfL, but that hdf5 thinks it is.

Right, what do you mean by not supported? It is listed as supported in real_kinds from iso_fortran_env?

Does the program

REAL(KIND=16) :: a
END

Not compile?

brtnfld avatar Mar 11 '25 17:03 brtnfld

That's right - it doesn't compile:

[rocky@ip-10-11-42-205 ~]$ module load acfl
Loading acfl/24.10.1
  Loading requirement: binutils/14.2.0
[rocky@ip-10-11-42-205 ~]$ armflang test3.f90 
F90-S-0081-Illegal selector - KIND parameter has unknown value for data type  (test3.f90: 1)
  0 inform,   0 warnings,   1 severes, 0 fatal for MAIN

128-bit floats are not supported in hardware on aarch64 - they're software emulated by a GCC library (slowly..), which isn't supported by that fortran compiler.

dslarm avatar Mar 11 '25 17:03 dslarm

The older way, then, was to use SELECTED_REAL_KIND Does this print -1, or 16?

k = SELECTED_REAL_KIND(19,2)
print*,k
end

brtnfld avatar Mar 11 '25 20:03 brtnfld

-1.

[rocky@ip-10-11-42-205 ~]$ cat >Test.f90
k = SELECTED_REAL_KIND(19,2)
print*,k
end
[rocky@ip-10-11-42-205 ~]$ module load acfl
Loading acfl/24.10.1
  Loading requirement: binutils/14.2.0
[rocky@ip-10-11-42-205 ~]$ armflang Test.f90 
[rocky@ip-10-11-42-205 ~]$ ./a.out 
           -1

dslarm avatar Mar 12 '25 07:03 dslarm

Ok, so for your patch, you can add in config/cmake/HDF5UseFortran.cmake the line

set (${HDF_PREFIX}_HAVE_ISO_FORTRAN_ENV 0)

after

# Check if the fortran compiler supports the intrinsic module "ISO_FORTRAN_ENV" (F08)

READ_SOURCE("PROGRAM PROG_FC_ISO_FORTRAN_ENV" "END PROGRAM PROG_FC_ISO_FORTRAN_ENV" SOURCE_CODE)
check_fortran_source_compiles (${SOURCE_CODE} HAVE_ISO_FORTRAN_ENV SRC_EXT f90)
if (${HAVE_ISO_FORTRAN_ENV})
  set (${HDF_PREFIX}_HAVE_ISO_FORTRAN_ENV 1)
else ()
  set (${HDF_PREFIX}_HAVE_ISO_FORTRAN_ENV 0)
endif ()

So it selects the test that does not use ISO_FORTRAN_ENV

brtnfld avatar Mar 12 '25 14:03 brtnfld

Alas no joy:

==> Using cached archive: /home/rocky/spack/var/spack/cache/_source-cache/archive/ec/ec2e13c52e60f9a01491bb3158cb3778c985697131fc6a342262d32a26e58e44.tar.gz
==> Applied patch /home/rocky/spack/var/spack/repos/builtin/packages/hdf5/acfl.kind16.patch
==> Ran patch() for hdf5
==> hdf5: Executing phase: 'cmake'
==> hdf5: Executing phase: 'build'
==> Error: ProcessError: Command exited with status 2:
    '/home/rocky/spack/opt/spack/linux-rocky9-neoverse_v2/arm-24.10.1/gmake-4.4.1-icbfetxuqlpjx4zqfk6kht52fqecbt3e/bin/make' '-j16'

5 errors found in build log:
     3134    F90-S-0081-Illegal selector - KIND parameter has unknown value for data type  (/tmp/rocky/spack-stage/spack-stage-hdf5-1.14.5-i7uffkyiywiumpumb7zqtehvqi3zxo7c/spack-build-i7uffky/fortran/static/H5_gen.F90:
              6049)

where acfl.kind16.patch is:

/home/rocky/spack/var/spack/repos/builtin/packages/hdf5/acfl.kind16.patch
*** a/config/cmake/HDFUseFortran.cmake	Wed Mar 12 14:20:33 2025
--- b/config/cmake/HDFUseFortran.cmake	Wed Mar 12 14:21:44 2025
*************** endif ()
*** 87,89 ****
--- 87,91 ----
          set (CMAKE_EXE_LINKER_FLAGS_DEBUG "/DEBUG" CACHE STRING "flags" FORCE)
      endif ()
  endif ()
+ 
+ set (${HDF_PREFIX}_HAVE_ISO_FORTRAN_ENV 0)

.. and looking at the source, it has applied it cleanly.

dslarm avatar Mar 12 '25 15:03 dslarm

the text output of cmake is (as expected I think) unchanged:

-- Performing Test FORTRAN_CHAR_ALLOC - Success
-- ....NUMBER OF INTEGER KINDS FOUND 4
-- ....REAL KINDS FOUND {4,8,16}
-- ....INTEGER KINDS FOUND {1,2,4,8}
-- ....MAX DECIMAL PRECISION 15
-- ....LOGICAL KINDS FOUND {1,2,4,8}
-- ....FOUND SIZEOF for REAL KINDs {4,8,8}
-- Found MPI_Fortran: /home/rocky/spack/opt/spack/linux-rocky9-neoverse_v2/arm-24.10.1/openmpi-5.0.6-m7gsvjwqb5rroxnfif76zg5emsf7oskq/lib/libmpi_usempif08.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1") found components: Fortran

dslarm avatar Mar 12 '25 15:03 dslarm

The root cause of the issue (as pointed out by @dslarm) is that ACfL is lying about what kinds are supported in iso_fortran_env's real_kinds.

I've made a PR that fixes the issue: https://github.com/HDFGroup/hdf5/pull/5401

Gfortran:

-- Check for working Fortran compiler: /fsx/spack/opt/spack/linux-ubuntu20.04-armv8.4a/gcc-9.4.0/gcc-12.4.0-vfykzopukml5pu7eedkbhcmq22blh25o/bin/gfortran - skipped                                                                                                                                     ...
-- ....NUMBER OF INTEGER KINDS FOUND 5
-- ....REAL KINDS FOUND {4,8,16}
-- ....INTEGER KINDS FOUND {1,2,4,8,16}
-- ....MAX DECIMAL PRECISION 33
-- ....LOGICAL KINDS FOUND {1,2,4,8,16}
-- ....FOUND SIZEOF for REAL KINDs {4,8,16}

ACfL:

-- Check for working Fortran compiler: /fsx/spack/opt/spack/linux-ubuntu20.04-aarch64/gcc-9.4.0/acfl-24.10.1-4uj73pi7jwmdd2w6u2efp674l55hapbv/arm-linux-compiler-24.10.1_Ubuntu-20.04/bin/armflang - skipped                                                                                            
...
-- ....NUMBER OF INTEGER KINDS FOUND 4
-- ....REAL KINDS FOUND {4,8}
-- ....INTEGER KINDS FOUND {1,2,4,8}
-- ....MAX DECIMAL PRECISION 15
-- ....LOGICAL KINDS FOUND {1,2,4,8}
-- ....FOUND SIZEOF for REAL KINDs {4,8}                                                                                                            

lrbison avatar Mar 21 '25 19:03 lrbison

Turns out nvfortran 25.3 is (half-)lying about half-precision (KIND=2) support as well. #5401 fixes the issue with that compiler as well. When can we expect a release including the fix in #5401?

nmnobre avatar Apr 14 '25 14:04 nmnobre