functional-fortran test_map fails for the complex32 case with gfortran-10 with -O3

trafficstars

Hello, Thank you for this excellent library. Here, I would report an issue. As I invoked ctest after installation, following error message was shown:

96% tests passed, 1 tests failed out of 25
Total Test time (real) =   7.27 sec
The following tests FAILED:
	 16 - test_map (Failed)
Errors while running CTest

Would you please see to this? Best regards, Tamoghna

Nov 17 '20 22:11 kyrtle

Hi @tamoghna-4119,

Thanks for reporting this. I can't reproduce it with gfortran-9.3.0 or ifort-2021.1--all tests pass.

What compiler and version do you use?

Also, can you run bin/test_map from the build directory and let me know the output that you get?

Nov 19 '20 00:11 milancurcic

Thank you for your answer and I am really sorry for replying only now. It might be a problem with gcc-10.2.0. bin/test_map fails for complex real32. All other 9tests have passed. I would try with gfortran-9.

Regards, dT

Nov 25 '20 17:11 kyrtle

I dig a little deeper into this. I put a few diagnostic prints in test_map.f90:

! test that fails
tests(n) = assert(all(map(xpowx_c4, c4) == c4**c4), 'map,  complex real32')
n = n + 1

print *, 'c4', c4
print *, 'map(xpowx_c4, c4)', map(xpowx_c4, c4)
print *, 'c4**c4', c4**c4
print *, 'map(xpowx_c4, c4) == c4**c4', map(xpowx_c4, c4) == c4**c4
print *, 'all(map(xpowx_c4, c4) == c4**c4)', all(map(xpowx_c4, c4) == c4**c4)

This test fails with gfortran-10 (not with 9, 8, or 7), and only with -O3 or higher optimization level.

$ gfortran-10 -O3 src/functional.f90 test/testing.f90 test/test_map.f90 && ./a.out
test map,  int8                                                           : PASS
test map,  int16                                                          : PASS
test map,  int32                                                          : PASS
test map,  int64                                                          : PASS
test map,  real32                                                         : PASS
test map,  real64                                                         : PASS
test map,  real128                                                        : PASS
test map,  complex real32                                                 : FAIL
 c4             (1.00000000,0.00000000)             (2.00000000,0.00000000)             (3.00000000,0.00000000)
 map(xpowx_c4, c4)             (1.00000000,0.00000000)             (4.00000000,0.00000000)             (27.0000019,0.00000000)
 c4**c4             (1.00000000,0.00000000)             (4.00000000,0.00000000)             (27.0000019,0.00000000)
 map(xpowx_c4, c4) == c4**c4 T T T
 all(map(xpowx_c4, c4) == c4**c4) F
test map,  complex real64                                                 : PASS
test map,  complex real128                                                : PASS
Ran a total of  10 tests.
  9 tests PASSED,    1 tests FAILED.
STOP 1

Although all 3 elements evaluate as equal, evaluating all() of that array comes back false:

 map(xpowx_c4, c4) == c4**c4 T T T
 all(map(xpowx_c4, c4) == c4**c4) F

My guess is that at high optimization level the compiler tries to do something too aggressive with either side of the == comparison when it's an argument to all().

Right now I don't have a solution but I will keep this open.

Feb 16 '21 15:02 milancurcic

When I was trying to figure out how to make it pass (https://github.com/wavebitscientific/functional-fortran/pull/23/commits/7bcf168456a80402f7f53973eae4df9d3962a300 / https://github.com/wavebitscientific/functional-fortran/pull/23), I discovered that it's not just with all() that weird things happen. Assigning map(xpowx_c4, c4) == c4**c4 to a variable I got T T F, even though map(xpowx_c4, c4) == c4**c4 evaluates to T T T (or at least prints out that way).

Mar 16 '21 14:03 zmoon

Hello all,

I can reproduce this test failure for v0.6.2 for gfortran 11.3.0 using the default build flags.

Best, Nakib

Dec 11 '22 13:12 nakib

Not sure if this adds to the confusion, the following evaluates to false:

tests(n) = assert(all(map(xpowx_c4, c4) == &
     [cmplx(1.0_real32, 0.0_real32), &
     cmplx(4.0_real32, 0.0_real32), &
     cmplx(27.0_real32, 0.0_real32)]), 'map,  complex real32')

whereas the following evaluates to true:

tests(n) = assert(all(c4**c4 == &
     [cmplx(1.0_real32, 0.0_real32), &
     cmplx(4.0_real32, 0.0_real32), &
     cmplx(27.0_real32, 0.0_real32)]), 'map,  complex real32')

for the given definition of c4 in the test. I have checked that it does not matter if the arguments are swapped across the == sign.

Interestingly, the following passes the test:

tests(n) = assert(all(map(xpowx_c8, c8) == c4**c4), 'map,  complex real64 LHS, complex real32 RHS')

This precision loss issue seems to be due to the application of the map function. Rather strange! Will play with it more.

Best, Nakib

Dec 11 '22 15:12 nakib

I have done further digging. It turns out that the compiler flag -fpeel-loops causes this particular test to fail. This is one of the flags that gets tacked on on top of the flags defining the -O2 optimization level when we use the -O3 optimization. Read more here: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

I quote from the page above the description of -fpeel-loops:

-fpeel-loops

Peels loops for which there is enough information that they do not roll much (from profile feedback or static analysis). It also turns on complete loop peeling (i.e. complete removal of loops with small constant number of iterations).

Enabled by -O3, -fprofile-use, and -fauto-profile.

It is unclear to me how this flag interacts with the application of the map_c4 function to result in the failed test. It seems like -fpeel-loops in conjuction with -O1 and -O2 will cause this issue. It works fine with the -O0 flag.

Perhaps it is better to ship with the -O2 optimization by default instead of -O3.

Dec 24 '22 14:12 nakib

Thanks for digging further! Are you sure -fpeel-loops causes it? I can still reproduce the issue with -O3 -fno-peel-loops.

Dec 24 '22 23:12 milancurcic

Hi Milan,

Good point. I too see the test failing with -O3 -fno-peel-loops. However, the test passes with -O2 -fgcse-after-reload -fipa-cp-clone -floop-interchange -floop-unroll-and-jam -fpredictive-commoning -fsplit-loops -fsplit-paths -ftree-loop-distribution -ftree-partial-pre -funswitch-loops -fvect-cost-model=dynamic -fversion-loops-for-strides. Here I have added all the additional O3 level flags by hand except for -fpeel-loops. Naively, I would think that this and -O3 -fno-peel-loops should give the same results.

Dec 25 '22 09:12 nakib

I see, thanks. In that case, I assume that it's some undocumented optimization going from -O2 to -O3 that breaks the map for complex values. I'm not sure yet whether it's better to use -O2 or -O3 for release. On one hand, all compiled code should work, and on the other hand, I don't want to penalize the entire library for one edge case. If nothing else, we can document the issue in README.md as a note for the user.

Dec 29 '22 20:12 milancurcic

functional-fortran functional-fortran copied to clipboard

test_map fails for the complex32 case with gfortran-10 with -O3

functional-fortran
functional-fortran copied to clipboard