neural-fortran
neural-fortran copied to clipboard
Test failure with ifx
Hi,
Is ifx (intel nex generation fortran compiler that is replacing ifort) supported, I'm getting the following failures with ifx 2023.1.0:
The following tests FAILED:
6 - test_maxpool2d_layer (Failed)
12 - test_io_hdf5 (Failed)
14 - test_dense_network_from_keras (Failed)
17 - test_optimizers (Failed)
Errors while running CTest
On release build, these are failing on memory error.
In debug more, only test_optimizers is failing.
All this is on master
Thanks
Thanks for reporting. I haven't tried ifx in a while, and definitely not a recent version. I'll try it and let you know what I find.
Hi @aminiussi, I can't seem to reproduce this. Here's what I have:
$ ifx --version
ifx (IFX) 2023.2.0 20230721
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.
HDF5 is 1.12.2 built with ifort-2021.6.
All tests pass on the latest main.
Similarly, all tests pass with ifort-2021.10.0 (that's the latest version released before deprecation in favor of ifx.
Hi @milancurcic,
The test in my build fails with "unmapped address" with the following stack trace:
6: 0 0x000000000004cb95 ucs_debug_print_backtrace() ???:0
6: 1 0x0000000000415d17 nf_maxpool2d_layer_mp_backward_() /scratch/alainm/view/neural-fortran/src/nf/nf_maxpool2d_layer_submodule.f90:107
6: 2 0x00000000004102b2 nf_layer_mp_backward_3d_() /scratch/alainm/view/neural-fortran/src/nf/nf_layer_submodule.f90:0
6: 3 0x000000000040d2d5 MAIN__() /scratch/alainm/view/neural-fortran/test/test_maxpool2d_layer.f90:77
14:37:01 [alainm@castor bld]# emacs /scratch/alainm/view/neural-fortran/test/test_maxpool2d_layer.f90
The element of the backtrace is weird: /scratch/alainm/view/neural-fortran/src/nf/nf_layer_submodule.f90:0
as there is no code there.
We are using hdf5 1.14.1, and the underlying gfortran is 12.2.0. Appart from that, our ifx is slightly older...
$ ifx --version ifx (IFX) 2023.2.0 20230721 Copyright (C) 1985-2023 Intel Corporation. All rights reserved.
Is that a parallel build and, if yes, which MPI is used ?
Thanks
I did a debug -check all build. The test is failing with:
forrtl: severe (408): fort: (3): Subscript #3 of the array MAXLOC_X has value 0 which is less than the lower bound of 1
In coarray image 4
Image PC Routine Line Source
test_maxpool2d_la 000000000042BD1A backward 106 nf_maxpool2d_layer_submodule.f90
test_maxpool2d_la 0000000000417470 backward_3d 87 nf_layer_submodule.f90
test_maxpool2d_la 000000000040E789 test_maxpool2d_la 77 test_maxpool2d_layer.f90
test_maxpool2d_la 000000000040B39D Unknown Unknown Unknown
libc-2.17.so 00007FFFF3C84555 __libc_start_main Unknown Unknown
test_maxpool2d_la 000000000040B2CB Unknown Unknown Unknown
Thank you, @aminiussi, this is very helpful and may be related to #145. It's possible that this is a bug that other compilers (and non-debug build modes) failed to catch but are producing incorrect results. I'll look deeper into this.
Is that a parallel build and, if yes, which MPI is used ?
I haven't built in parallel with the Intel compilers. It's Intel MPI that comes bundled with the OneAPI suite, but I don't think I configured it properly on my computer and haven't had time to dedicate to a parallel Intel build.