functional-fortran
functional-fortran copied to clipboard
test_map fails for the complex32 case with gfortran-10 with -O3
Hello,
Thank you for this excellent library.
Here, I would report an issue. As I invoked ctest after installation, following error message was shown:
96% tests passed, 1 tests failed out of 25
Total Test time (real) = 7.27 sec
The following tests FAILED:
16 - test_map (Failed)
Errors while running CTest
Would you please see to this? Best regards, Tamoghna
Hi @tamoghna-4119,
Thanks for reporting this. I can't reproduce it with gfortran-9.3.0 or ifort-2021.1--all tests pass.
What compiler and version do you use?
Also, can you run bin/test_map from the build directory and let me know the output that you get?
Thank you for your answer and I am really sorry for replying only now.
It might be a problem with gcc-10.2.0.
bin/test_map fails for complex real32. All other 9tests have passed.
I would try with gfortran-9.
Regards, dT
I dig a little deeper into this. I put a few diagnostic prints in test_map.f90:
! test that fails
tests(n) = assert(all(map(xpowx_c4, c4) == c4**c4), 'map, complex real32')
n = n + 1
print *, 'c4', c4
print *, 'map(xpowx_c4, c4)', map(xpowx_c4, c4)
print *, 'c4**c4', c4**c4
print *, 'map(xpowx_c4, c4) == c4**c4', map(xpowx_c4, c4) == c4**c4
print *, 'all(map(xpowx_c4, c4) == c4**c4)', all(map(xpowx_c4, c4) == c4**c4)
This test fails with gfortran-10 (not with 9, 8, or 7), and only with -O3 or higher optimization level.
$ gfortran-10 -O3 src/functional.f90 test/testing.f90 test/test_map.f90 && ./a.out
test map, int8 : PASS
test map, int16 : PASS
test map, int32 : PASS
test map, int64 : PASS
test map, real32 : PASS
test map, real64 : PASS
test map, real128 : PASS
test map, complex real32 : FAIL
c4 (1.00000000,0.00000000) (2.00000000,0.00000000) (3.00000000,0.00000000)
map(xpowx_c4, c4) (1.00000000,0.00000000) (4.00000000,0.00000000) (27.0000019,0.00000000)
c4**c4 (1.00000000,0.00000000) (4.00000000,0.00000000) (27.0000019,0.00000000)
map(xpowx_c4, c4) == c4**c4 T T T
all(map(xpowx_c4, c4) == c4**c4) F
test map, complex real64 : PASS
test map, complex real128 : PASS
Ran a total of 10 tests.
9 tests PASSED, 1 tests FAILED.
STOP 1
Although all 3 elements evaluate as equal, evaluating all() of that array comes back false:
map(xpowx_c4, c4) == c4**c4 T T T
all(map(xpowx_c4, c4) == c4**c4) F
My guess is that at high optimization level the compiler tries to do something too aggressive with either side of the == comparison when it's an argument to all().
Right now I don't have a solution but I will keep this open.
When I was trying to figure out how to make it pass (https://github.com/wavebitscientific/functional-fortran/pull/23/commits/7bcf168456a80402f7f53973eae4df9d3962a300 / https://github.com/wavebitscientific/functional-fortran/pull/23), I discovered that it's not just with all() that weird things happen. Assigning map(xpowx_c4, c4) == c4**c4 to a variable I got T T F, even though map(xpowx_c4, c4) == c4**c4 evaluates to T T T (or at least prints out that way).
Hello all,
I can reproduce this test failure for v0.6.2 for gfortran 11.3.0 using the default build flags.
Best, Nakib
Not sure if this adds to the confusion, the following evaluates to false:
tests(n) = assert(all(map(xpowx_c4, c4) == &
[cmplx(1.0_real32, 0.0_real32), &
cmplx(4.0_real32, 0.0_real32), &
cmplx(27.0_real32, 0.0_real32)]), 'map, complex real32')
whereas the following evaluates to true:
tests(n) = assert(all(c4**c4 == &
[cmplx(1.0_real32, 0.0_real32), &
cmplx(4.0_real32, 0.0_real32), &
cmplx(27.0_real32, 0.0_real32)]), 'map, complex real32')
for the given definition of c4 in the test. I have checked that it does not matter if the arguments are swapped across the == sign.
Interestingly, the following passes the test:
tests(n) = assert(all(map(xpowx_c8, c8) == c4**c4), 'map, complex real64 LHS, complex real32 RHS')
This precision loss issue seems to be due to the application of the map function. Rather strange! Will play with it more.
Best, Nakib
I have done further digging. It turns out that the compiler flag -fpeel-loops causes this particular test to fail. This is one of the flags that gets tacked on on top of the flags defining the -O2 optimization level when we use the -O3 optimization. Read more here: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
I quote from the page above the description of -fpeel-loops:
-fpeel-loops
Peels loops for which there is enough information that they do not roll much (from profile feedback or static analysis). It also turns on complete loop peeling (i.e. complete removal of loops with small constant number of iterations). Enabled by -O3, -fprofile-use, and -fauto-profile.
It is unclear to me how this flag interacts with the application of the map_c4 function to result in the failed test. It seems like -fpeel-loops in conjuction with -O1 and -O2 will cause this issue. It works fine with the -O0 flag.
Perhaps it is better to ship with the -O2 optimization by default instead of -O3.
Thanks for digging further! Are you sure -fpeel-loops causes it? I can still reproduce the issue with -O3 -fno-peel-loops.
Hi Milan,
Good point. I too see the test failing with -O3 -fno-peel-loops. However, the test passes with -O2 -fgcse-after-reload -fipa-cp-clone -floop-interchange -floop-unroll-and-jam -fpredictive-commoning -fsplit-loops -fsplit-paths -ftree-loop-distribution -ftree-partial-pre -funswitch-loops -fvect-cost-model=dynamic -fversion-loops-for-strides. Here I have added all the additional O3 level flags by hand except for -fpeel-loops. Naively, I would think that this and -O3 -fno-peel-loops should give the same results.
I see, thanks. In that case, I assume that it's some undocumented optimization going from -O2 to -O3 that breaks the map for complex values. I'm not sure yet whether it's better to use -O2 or -O3 for release. On one hand, all compiled code should work, and on the other hand, I don't want to penalize the entire library for one edge case. If nothing else, we can document the issue in README.md as a note for the user.