Test failures with -mfma + -ffp-contract=fast compiler flags
Experienced test failures when building OpenColorIO 2.0.0 with -march=znver2 on GCC, after process of elimination found the culprit to be -mfma specifically. After building with Clang got the same test failures when also explicitly enabling -ffp-contract=fast. Tested also with master branch as of commit https://github.com/AcademySoftwareFoundation/OpenColorIO/commit/4e27f9672ab013c1e4d9c8965f51842e66bc0c87 and the failures are identical.
https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Optimize-Options.html https://releases.llvm.org/11.0.1/tools/clang/docs/ClangCommandLineReference.html
clang version 11.1.0 gcc version 10.2.0 Distribution is Gentoo Linux
cmake options:
-DCMAKE_INSTALL_PREFIX=/usr -DBUILD_SHARED_LIBS=ON -DLIB_SUFFIX= -DOCIO_BUILD_APPS=no \
-DOCIO_BUILD_DOCS=no -DOCIO_BUILD_GPU_TESTS=OFF -DOCIO_BUILD_PYTHON=yes -DOCIO_BUILD_TESTS=yes \
-DOCIO_BUILD_JAVA=OFF -DOCIO_INSTALL_EXT_PACKAGES=NONE -DOCIO_USE_SSE=yes
Logs
Build log GCC Build log Clang Test logs GCC Test logs Clang
How to reproduce
Compile OpenColorIO with following flags and execute the tests.
Compiler flags for GCC: -mfma
Compiler flags for Clang: -mfma -ffp-contract=fast
Why the difference between Clang and GCC compiler flags?
GCC defaults to -ffp-contract=fast while Clang defaults to -ffp-contract=on.
Why would you have -mfma enabled?
-mfma is enabled on basically all -march options since haswell, -mfma is also enabled in the upcoming x86-64-v3 march feature level.
https://www.phoronix.com/scan.php?page=news_item&px=GCC-11-x86-64-Feature-Levels
https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
Impact
Very minor as a minority compiles OpenColorIO from source with these optimizations. Distribution binary providers might experience this failure if building binaries on GCC with future micro-architecture feature levels from v3 and up.
Tests which got the failures
GCC failed with 11 tests, while Clang with 12 tests.
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/CPUProcessor_tests.cpp:824:
FAILED: cacheID == expectedID
values were 'CPU Processor: from 16ui to 32f oFlags 263995331 ops: <Lut1D $17d2b407e859021ee87958e5d4e91c8f forward default standard domain none>' and 'CPU Processor: from 16ui to 32f oFlags 263995331 ops: <Lut1D $a57d7444e629d796d2234c18a0539c74 forward default standard domain none>'
[126/991] [CPUProcessor / with_several_ops ] - FAILED
// Truncated the rest, log includes several of these with values being off by one in every single one.
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/CPUProcessor_tests.cpp:2167:
FAILED: outValues[idx+3] == OCIO::Converter<outBD>::CastValue(pxl[3])
values were '65214' and '65215'
[135/991] [CPUProcessor / optimizations ] - FAILED
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/fileformats/FileFormatCTF_tests.cpp:8081:
FAILED: expectedCLF == output1.str()
values were '<?xml version="1.0" encoding="UTF-8"?>
<ProcessList compCLFversion="3" id="UID42">
<Range inBitDepth="32f" outBitDepth="32f">
<minInValue> -0.125 </minInValue>
<maxInValue> 1.125 </maxInValue>
<minOutValue> 0 </minOutValue>
<maxOutValue> 1 </maxOutValue>
</Range>
<LUT1D inBitDepth="32f" outBitDepth="32f">
<Array dim="10 1">
0
0.11111112
0.22222224
0.33333334
0.44444448
0.55555558
0.66666675
0.77777779
0.88888896
1
</Array>
</LUT1D>
<LUT3D inBitDepth="32f" outBitDepth="32f">
<Array dim="2 2 2 3">
0 0 0
0.0361 0.0361 0.53609997
0.3576 0.85759997 0.3576
0.3937 0.8937 0.8937
0.6063 0.1063 0.1063
0.64240003 0.1424 0.64239997
0.96389997 0.96389997 0.4639
1 1 1
</Array>
</LUT3D>
</ProcessList>
' and '<?xml version="1.0" encoding="UTF-8"?>
<ProcessList compCLFversion="3" id="UID42">
<Range inBitDepth="32f" outBitDepth="32f">
<minInValue> -0.125 </minInValue>
<minOutValue> 0 </minOutValue>
<maxOutValue> 1 </maxOutValue>
</Range>
<LUT1D inBitDepth="32f" outBitDepth="32f">
<Array dim="10 1">
0
0.11111111
0.22222222
0.33333334
0.44444445
0.55555558
0.66666669
0.77777779
0.8888889
1
</Array>
</LUT1D>
<LUT3D inBitDepth="32f" outBitDepth="32f">
<Array dim="2 2 2 3">
0 0 0
0.0361 0.0361 0.53609997
0.3576 0.85759997 0.3576
0.3937 0.8937 0.8937
0.6063 0.1063 0.1063
0.64240003 0.1424 0.64239997
0.96389997 0.96389997 0.4639
1 1 1
</Array>
</LUT3D>
</ProcessList>
'
[393/991] [FileFormatCTF / bake_1d_3d ] - FAILED
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/exposurecontrast/ExposureContrastOpCPU_tests.cpp:234:
FAILED: rgba[0] == logECVal(rgbaImage[0], const_ec, inMax, outMax)
values were '0.13045' and '0.13045'
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/exposurecontrast/ExposureContrastOpCPU_tests.cpp:235:
FAILED: rgba[1] == logECVal(rgbaImage[1], const_ec, inMax, outMax)
values were '0.50108' and '0.50108'
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/exposurecontrast/ExposureContrastOpCPU_tests.cpp:239:
FAILED: rgba[5] == logECVal(rgbaImage[5], const_ec, inMax, outMax)
values were '0.10108' and '0.10108'
[576/991] [ExposureContrastRenderer / log ] - FAILED
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:122:
FAILED: Index: 17 - Values: 0.896949828 and: 0.896951199 - Threshold: 1.00000001e-07
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:122:
FAILED: Index: 21 - Values: 1.10895336 and: 1.10895324 - Threshold: 1.00000001e-07
[619/991] [GammaOpCPU / apply_basic_style_fwd ] - FAILED
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:185:
FAILED: Index: 16 - Values: 0.830311298 and: 0.830311418 - Threshold: 1.00000001e-07
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:185:
FAILED: Index: 17 - Values: 0.976092517 and: 0.976092875 - Threshold: 1.00000001e-07
[620/991] [GammaOpCPU / apply_basic_style_rev ] - FAILED
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:276:
FAILED: Index: 17 - Values: 0.896949828 and: 0.896951199 - Threshold: 1.00000001e-07
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:276:
FAILED: Index: 21 - Values: -0.896949828 and: -0.896951199 - Threshold: 1.00000001e-07
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:276:
FAILED: Index: 25 - Values: 1.10895336 and: 1.10895324 - Threshold: 1.00000001e-07
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:276:
FAILED: Index: 29 - Values: -1.10895336 and: -1.10895324 - Threshold: 1.00000001e-07
[621/991] [GammaOpCPU / apply_basic_mirror_style_fwd ] - FAILED
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:366:
FAILED: Index: 16 - Values: 0.830311298 and: 0.830311418 - Threshold: 1.00000001e-07
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:366:
FAILED: Index: 17 - Values: 0.976092517 and: 0.976092875 - Threshold: 1.00000001e-07
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:366:
FAILED: Index: 20 - Values: -0.830311298 and: -0.830311418 - Threshold: 1.00000001e-07
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:366:
FAILED: Index: 21 - Values: -0.976092517 and: -0.976092875 - Threshold: 1.00000001e-07
[622/991] [GammaOpCPU / apply_basic_mirror_style_rev ] - FAILED
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:444:
FAILED: Index: 17 - Values: 0.896949828 and: 0.896951199 - Threshold: 1.00000001e-07
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:444:
FAILED: Index: 25 - Values: 1.10895336 and: 1.10895324 - Threshold: 1.00000001e-07
[623/991] [GammaOpCPU / apply_basic_pass_thru_style_fwd ] - FAILED
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:522:
FAILED: Index: 16 - Values: 0.830311298 and: 0.830311418 - Threshold: 1.00000001e-07
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:522:
FAILED: Index: 17 - Values: 0.976092517 and: 0.976092875 - Threshold: 1.00000001e-07
[624/991] [GammaOpCPU / apply_basic_pass_thru_style_rev ] - FAILED
// Truncated the rest, log includes several more of these with values differing by ~0.00001-0.0000001.
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:576:
FAILED: Index: 22 - Values: 1.49998474 and: 1.49998403 - Threshold: 1.00000001e-07
[625/991] [GammaOpCPU / apply_moncurve_style_fwd ] - FAILED
// Truncated the rest, log includes several more of these with values differing by ~0.00001-0.0000001.
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:690:
FAILED: Index: 30 - Values: -1.84183896 and: -1.84183872 - Threshold: 1.00000001e-07
[627/991] [GammaOpCPU / apply_moncurve_mirror_style_fwd ] - FAILED
Errors were identical between GCC and Clang except for the following which was only in Clang.
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/exposurecontrast/ExposureContrastOpCPU_tests.cpp:234:
FAILED: rgba[0] == logECVal(rgbaImage[0], const_ec, inMax, outMax)
values were '0.13045' and '0.13045'
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/exposurecontrast/ExposureContrastOpCPU_tests.cpp:235:
FAILED: rgba[1] == logECVal(rgbaImage[1], const_ec, inMax, outMax)
values were '0.50108' and '0.50108'
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/exposurecontrast/ExposureContrastOpCPU_tests.cpp:239:
FAILED: rgba[5] == logECVal(rgbaImage[5], const_ec, inMax, outMax)
values were '0.10108' and '0.10108'
[576/991] [ExposureContrastRenderer / log ] - FAILED
If anyone has time to work on this, the test failures seem minor and easily fixable (e.g. just loosen the tolerance slightly).
It would be very interesting to run ocioperf before and after adding these compiler flags and see how much of a performance benefit there is. Perhaps we should be setting these flags by default?
For reference, adding a 2023-12 chat from the OCIO Slack: 2 days ago
Mark Reid Its been nagging me for a while that some of cpu tests on apple silicon have been failing. It was my suspicion its was due FMA instruction on arm64. I think I've confirmed that. If I compile OCIO with -ffp-contract=off all the cpu tests pass. this was my cmake command cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_FLAGS="-ffp-contract=off" -DCMAKE_CXX_FLAGS="-ffp-contract=off" (edited)
doug walker That's interesting that the CPU tests are failing for you. On my Macbook Pro (M1 Pro chip), they succeed, with the default settings. What chip are you using?
Mark Reid A few have always failed for me with the default settings. I'm on on a M2 Max with clang 15.0.
Mark Reid interesting looks like it could be a change in defaults between clang versions if your using a older version https://godbolt.org/z/sj5ofs5h4 (edited)
godbolt.orggodbolt.org Compiler Explorer - C float bar(float x, float y, float z) { return x * y + z; }
doug walker Well, I used clang 14, which your test seems to indicate uses fmadd. Nevertheless, I do think you're right, it's probably more likely to be a compiler issue than an M1/M2 difference.
Linking to issue #1784.
#1950 might have resolved this issue. It fixes a few unit test failures related to FMA on apple silicon.