OpenColorIO icon indicating copy to clipboard operation
OpenColorIO copied to clipboard

Test failures with -mfma + -ffp-contract=fast compiler flags

Open parona-source opened this issue 4 years ago • 1 comments

Experienced test failures when building OpenColorIO 2.0.0 with -march=znver2 on GCC, after process of elimination found the culprit to be -mfma specifically. After building with Clang got the same test failures when also explicitly enabling -ffp-contract=fast. Tested also with master branch as of commit https://github.com/AcademySoftwareFoundation/OpenColorIO/commit/4e27f9672ab013c1e4d9c8965f51842e66bc0c87 and the failures are identical.

https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Optimize-Options.html https://releases.llvm.org/11.0.1/tools/clang/docs/ClangCommandLineReference.html

clang version 11.1.0 gcc version 10.2.0 Distribution is Gentoo Linux

cmake options:

-DCMAKE_INSTALL_PREFIX=/usr -DBUILD_SHARED_LIBS=ON -DLIB_SUFFIX= -DOCIO_BUILD_APPS=no \
-DOCIO_BUILD_DOCS=no -DOCIO_BUILD_GPU_TESTS=OFF -DOCIO_BUILD_PYTHON=yes -DOCIO_BUILD_TESTS=yes \
-DOCIO_BUILD_JAVA=OFF -DOCIO_INSTALL_EXT_PACKAGES=NONE -DOCIO_USE_SSE=yes

Logs

Build log GCC Build log Clang Test logs GCC Test logs Clang

How to reproduce

Compile OpenColorIO with following flags and execute the tests.

Compiler flags for GCC: -mfma Compiler flags for Clang: -mfma -ffp-contract=fast

Why the difference between Clang and GCC compiler flags?

GCC defaults to -ffp-contract=fast while Clang defaults to -ffp-contract=on.

Why would you have -mfma enabled?

-mfma is enabled on basically all -march options since haswell, -mfma is also enabled in the upcoming x86-64-v3 march feature level. https://www.phoronix.com/scan.php?page=news_item&px=GCC-11-x86-64-Feature-Levels https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html

Impact

Very minor as a minority compiles OpenColorIO from source with these optimizations. Distribution binary providers might experience this failure if building binaries on GCC with future micro-architecture feature levels from v3 and up.

Tests which got the failures

GCC failed with 11 tests, while Clang with 12 tests.

/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/CPUProcessor_tests.cpp:824:
FAILED: cacheID == expectedID
        values were 'CPU Processor: from 16ui to 32f oFlags 263995331 ops: <Lut1D $17d2b407e859021ee87958e5d4e91c8f forward default standard domain none>' and 'CPU Processor: from 16ui to 32f oFlags 263995331 ops: <Lut1D $a57d7444e629d796d2234c18a0539c74 forward default standard domain none>'
[126/991] [CPUProcessor / with_several_ops                             ] - FAILED
// Truncated the rest, log includes several of these with values being off by one in every single one.
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/CPUProcessor_tests.cpp:2167:
FAILED: outValues[idx+3] == OCIO::Converter<outBD>::CastValue(pxl[3])
        values were '65214' and '65215'
[135/991] [CPUProcessor / optimizations                                ] - FAILED
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/fileformats/FileFormatCTF_tests.cpp:8081:
FAILED: expectedCLF == output1.str()
        values were '<?xml version="1.0" encoding="UTF-8"?>
<ProcessList compCLFversion="3" id="UID42">
    <Range inBitDepth="32f" outBitDepth="32f">
        <minInValue> -0.125 </minInValue>
        <maxInValue> 1.125 </maxInValue>
        <minOutValue> 0 </minOutValue>
        <maxOutValue> 1 </maxOutValue>
    </Range>
    <LUT1D inBitDepth="32f" outBitDepth="32f">
        <Array dim="10 1">
          0
 0.11111112
 0.22222224
 0.33333334
 0.44444448
 0.55555558
 0.66666675
 0.77777779
 0.88888896
          1
        </Array>
    </LUT1D>
    <LUT3D inBitDepth="32f" outBitDepth="32f">
        <Array dim="2 2 2 3">
          0           0           0
     0.0361      0.0361  0.53609997
     0.3576  0.85759997      0.3576
     0.3937      0.8937      0.8937
     0.6063      0.1063      0.1063
 0.64240003      0.1424  0.64239997
 0.96389997  0.96389997      0.4639
          1           1           1
        </Array>
    </LUT3D>
</ProcessList>
' and '<?xml version="1.0" encoding="UTF-8"?>
<ProcessList compCLFversion="3" id="UID42">
    <Range inBitDepth="32f" outBitDepth="32f">
        <minInValue> -0.125 </minInValue>
        <minOutValue> 0 </minOutValue>
        <maxOutValue> 1 </maxOutValue>
    </Range>
    <LUT1D inBitDepth="32f" outBitDepth="32f">
        <Array dim="10 1">
          0
 0.11111111
 0.22222222
 0.33333334
 0.44444445
 0.55555558
 0.66666669
 0.77777779
  0.8888889
          1
        </Array>
    </LUT1D>
    <LUT3D inBitDepth="32f" outBitDepth="32f">
        <Array dim="2 2 2 3">
          0           0           0
     0.0361      0.0361  0.53609997
     0.3576  0.85759997      0.3576
     0.3937      0.8937      0.8937
     0.6063      0.1063      0.1063
 0.64240003      0.1424  0.64239997
 0.96389997  0.96389997      0.4639
          1           1           1
        </Array>
    </LUT3D>
</ProcessList>
'
[393/991] [FileFormatCTF / bake_1d_3d                                  ] - FAILED
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/exposurecontrast/ExposureContrastOpCPU_tests.cpp:234:
FAILED: rgba[0] == logECVal(rgbaImage[0], const_ec, inMax, outMax)
        values were '0.13045' and '0.13045'
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/exposurecontrast/ExposureContrastOpCPU_tests.cpp:235:
FAILED: rgba[1] == logECVal(rgbaImage[1], const_ec, inMax, outMax)
        values were '0.50108' and '0.50108'
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/exposurecontrast/ExposureContrastOpCPU_tests.cpp:239:
FAILED: rgba[5] == logECVal(rgbaImage[5], const_ec, inMax, outMax)
        values were '0.10108' and '0.10108'
[576/991] [ExposureContrastRenderer / log                              ] - FAILED
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:122:
FAILED: Index: 17 - Values: 0.896949828 and: 0.896951199 - Threshold: 1.00000001e-07
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:122:
FAILED: Index: 21 - Values: 1.10895336 and: 1.10895324 - Threshold: 1.00000001e-07
[619/991] [GammaOpCPU / apply_basic_style_fwd                          ] - FAILED
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:185:
FAILED: Index: 16 - Values: 0.830311298 and: 0.830311418 - Threshold: 1.00000001e-07
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:185:
FAILED: Index: 17 - Values: 0.976092517 and: 0.976092875 - Threshold: 1.00000001e-07
[620/991] [GammaOpCPU / apply_basic_style_rev                          ] - FAILED
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:276:
FAILED: Index: 17 - Values: 0.896949828 and: 0.896951199 - Threshold: 1.00000001e-07
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:276:
FAILED: Index: 21 - Values: -0.896949828 and: -0.896951199 - Threshold: 1.00000001e-07
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:276:
FAILED: Index: 25 - Values: 1.10895336 and: 1.10895324 - Threshold: 1.00000001e-07
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:276:
FAILED: Index: 29 - Values: -1.10895336 and: -1.10895324 - Threshold: 1.00000001e-07
[621/991] [GammaOpCPU / apply_basic_mirror_style_fwd                   ] - FAILED
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:366:
FAILED: Index: 16 - Values: 0.830311298 and: 0.830311418 - Threshold: 1.00000001e-07
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:366:
FAILED: Index: 17 - Values: 0.976092517 and: 0.976092875 - Threshold: 1.00000001e-07
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:366:
FAILED: Index: 20 - Values: -0.830311298 and: -0.830311418 - Threshold: 1.00000001e-07
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:366:
FAILED: Index: 21 - Values: -0.976092517 and: -0.976092875 - Threshold: 1.00000001e-07
[622/991] [GammaOpCPU / apply_basic_mirror_style_rev                   ] - FAILED

/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:444:
FAILED: Index: 17 - Values: 0.896949828 and: 0.896951199 - Threshold: 1.00000001e-07
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:444:
FAILED: Index: 25 - Values: 1.10895336 and: 1.10895324 - Threshold: 1.00000001e-07
[623/991] [GammaOpCPU / apply_basic_pass_thru_style_fwd                ] - FAILED
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:522:
FAILED: Index: 16 - Values: 0.830311298 and: 0.830311418 - Threshold: 1.00000001e-07
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:522:
FAILED: Index: 17 - Values: 0.976092517 and: 0.976092875 - Threshold: 1.00000001e-07
[624/991] [GammaOpCPU / apply_basic_pass_thru_style_rev                ] - FAILED

// Truncated the rest, log includes several more of these with values differing by ~0.00001-0.0000001.
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:576:
FAILED: Index: 22 - Values: 1.49998474 and: 1.49998403 - Threshold: 1.00000001e-07
[625/991] [GammaOpCPU / apply_moncurve_style_fwd                       ] - FAILED
// Truncated the rest, log includes several more of these with values differing by ~0.00001-0.0000001.
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:690:
FAILED: Index: 30 - Values: -1.84183896 and: -1.84183872 - Threshold: 1.00000001e-07
[627/991] [GammaOpCPU / apply_moncurve_mirror_style_fwd                ] - FAILED

Errors were identical between GCC and Clang except for the following which was only in Clang.

/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/exposurecontrast/ExposureContrastOpCPU_tests.cpp:234:
FAILED: rgba[0] == logECVal(rgbaImage[0], const_ec, inMax, outMax)
        values were '0.13045' and '0.13045'
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/exposurecontrast/ExposureContrastOpCPU_tests.cpp:235:
FAILED: rgba[1] == logECVal(rgbaImage[1], const_ec, inMax, outMax)
        values were '0.50108' and '0.50108'
/var/tmp/portage/media-libs/opencolorio-2.0.0-r1/work/OpenColorIO-2.0.0/tests/cpu/ops/exposurecontrast/ExposureContrastOpCPU_tests.cpp:239:
FAILED: rgba[5] == logECVal(rgbaImage[5], const_ec, inMax, outMax)
        values were '0.10108' and '0.10108'
[576/991] [ExposureContrastRenderer / log                              ] - FAILED

parona-source avatar Apr 04 '21 14:04 parona-source

If anyone has time to work on this, the test failures seem minor and easily fixable (e.g. just loosen the tolerance slightly).

It would be very interesting to run ocioperf before and after adding these compiler flags and see how much of a performance benefit there is. Perhaps we should be setting these flags by default?

doug-walker avatar Apr 22 '22 00:04 doug-walker

For reference, adding a 2023-12 chat from the OCIO Slack: 2 days ago

Mark Reid Its been nagging me for a while that some of cpu tests on apple silicon have been failing. It was my suspicion its was due FMA instruction on arm64. I think I've confirmed that. If I compile OCIO with -ffp-contract=off all the cpu tests pass. this was my cmake command cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_FLAGS="-ffp-contract=off" -DCMAKE_CXX_FLAGS="-ffp-contract=off" (edited)

doug walker That's interesting that the CPU tests are failing for you. On my Macbook Pro (M1 Pro chip), they succeed, with the default settings. What chip are you using?

Mark Reid A few have always failed for me with the default settings. I'm on on a M2 Max with clang 15.0.

Mark Reid interesting looks like it could be a change in defaults between clang versions if your using a older version https://godbolt.org/z/sj5ofs5h4 (edited)

godbolt.orggodbolt.org Compiler Explorer - C float bar(float x, float y, float z) { return x * y + z; }

doug walker Well, I used clang 14, which your test seems to indicate uses fmadd. Nevertheless, I do think you're right, it's probably more likely to be a compiler issue than an M1/M2 difference.

doug-walker avatar Dec 21 '23 02:12 doug-walker

Linking to issue #1784.

doug-walker avatar Dec 21 '23 07:12 doug-walker

#1950 might have resolved this issue. It fixes a few unit test failures related to FMA on apple silicon.

markreidvfx avatar Apr 29 '24 20:04 markreidvfx