OpenColorIO copied to clipboard
Failing tests when FMA instruction is used
Some tests are failing on riscv64 architecture because the results on riscv64 is more precise than other architectures.
- Operating System: Arch Linux
- Architecture: riscv64gc
- OpenColorIO Version: 2.2.1
For example, test GammaOpCPU, apply_moncurve_mirror_style_fwd
in tests/cpu/ops/gamma/GammaOpCPU_tests.cpp
Detailed output from failed test
rgba[0] = 0.94786727428436279 0.83333331346511841 0.71428573131561279 0.625
rgba[1] = 0.052132699638605118 0.1666666716337204 0.28571429848670959 0.375
rgba[2] = 2.4000000953674316 2.2000000476837158 2 1.7999999523162842
rgba[3] = 0.039285715669393539 0.1666666716337204 0.40000000596046448 0.75
rgba[4] = 0.077380158007144928 0.44192609190940857 0.8163265585899353 0.98202729225158691
Iteration 0
sign = 1 1 1 1
pixel = 0.00050000002374872565 0.004999999888241291 0.05000000074505806 0.75
repro: pixel[2]=1028443341, blu[0] = 1060559726, blu[1] = 1049774373
intermediate = 0.052606634795665741 0.17083333432674408 0.32142859697341919 0.84375
data = 0.00085211289115250111 0.02049555629491806 0.10331634432077408 0.73652046918869019
out = 3.8690079236403108e-05 0.0022096303291618824 0.040816329419612885 0.73652046918869019
Iteration 1
sign = -1 -1 -1 -1
pixel = 0.00050000002374872565 0.004999999888241291 0.05000000074505806 0.75
repro: pixel[2]=1028443341, blu[0] = 1060559726, blu[1] = 1049774373
intermediate = 0.052606634795665741 0.17083333432674408 0.32142859697341919 0.84375
data = 0.00085211289115250111 0.02049555629491806 0.10331634432077408 0.73652046918869019
out = -3.8690079236403108e-05 -0.0022096303291618824 -0.040816329419612885 -0.73652046918869019
Iteration 2
sign = 1 1 1 1
pixel = 0.25 0.5 0.75 1
repro: pixel[2]=1061158912, blu[0] = 1060559726, blu[1] = 1049774373
intermediate = 0.28909951448440552 0.58333331346511841 0.82142859697341919 1
data = 0.050876077264547348 0.30550399422645569 0.67474496364593506 1
out = 0.050876077264547348 0.30550399422645569 0.67474496364593506 1
Iteration 3
sign = -1 -1 -1 -1
pixel = 0.25 0.5 0.75 1
repro: pixel[2]=1061158912, blu[0] = 1060559726, blu[1] = 1049774373
intermediate = 0.28909951448440552 0.58333331346511841 0.82142859697341919 1
data = 0.050876077264547348 0.30550399422645569 0.67474496364593506 1
out = -0.050876077264547348 -0.30550399422645569 -0.67474496364593506 -1
Iteration 4
sign = 1 1 1 1
pixel = 0.80000001192092896 0.94999998807907104 1 1.5
repro: pixel[2]=1065353216, blu[0] = 1060559726, blu[1] = 1049774373
intermediate = 0.81042653322219849 0.95833331346511841 1 1.3125
data = 0.60382729768753052 0.91061854362487793 1 1.6314687728881836
out = 0.60382729768753052 0.91061854362487793 1 1.6314687728881836
Iteration 5
sign = -1 -1 -1 -1
pixel = 0.80000001192092896 0.94999998807907104 1 1.5
repro: pixel[2]=1065353216, blu[0] = 1060559726, blu[1] = 1049774373
intermediate = 0.81042653322219849 0.95833331346511841 1 1.3125
data = 0.60382729768753052 0.91061854362487793 1 1.6314687728881836
out = -0.60382729768753052 -0.91061854362487793 -1 -1.6314687728881836
Iteration 6
sign = 1 1 1 1
pixel = 1.0049999952316284 1.0499999523162842 1.5 1
repro: pixel[2]=1069547520, blu[0] = 1060559726, blu[1] = 1049774373
intermediate = 1.0047392845153809 1.0416666269302368 1.3571429252624512 1
data = 1.0114120244979858 1.0939645767211914 1.8418369293212891 1
out = 1.0114120244979858 1.0939645767211914 1.8418369293212891 1
Iteration 7
sign = -1 -1 -1 -1
pixel = 1.0049999952316284 1.0499999523162842 1.5 1
repro: pixel[2]=1069547520, blu[0] = 1060559726, blu[1] = 1049774373
intermediate = 1.0047392845153809 1.0416666269302368 1.3571429252624512 1
data = 1.0114120244979858 1.0939645767211914 1.8418369293212891 1
out = -1.0114120244979858 -1.0939645767211914 -1.8418369293212891 -1
Iteration 8
sign = -1 1 1 1
pixel = inf inf nan 0
repro: pixel[2]=2143289344, blu[0] = 1060559726, blu[1] = 1049774373
intermediate = inf inf nan 0.375
data = inf inf nan 0.17110247910022736
out = -inf inf nan 0
Result[0] = 3.8689999200869352e-05
Image [0] = 3.8690079236403108e-05
Result[1] = 0.0022096300963312387
Image [1] = 0.0022096303291618824
Result[2] = 0.040816318243741989
Image [2] = 0.040816329419612885
Result[3] = 0.73652046918869019
Image [3] = 0.73652046918869019
Result[4] = -3.8689999200869352e-05
Image [4] = -3.8690079236403108e-05
Result[5] = -0.0022096300963312387
Image [5] = -0.0022096303291618824
Result[6] = -0.040816318243741989
Image [6] = -0.040816329419612885
Result[7] = -0.73652046918869019
Image [7] = -0.73652046918869019
Result[8] = 0.050876069813966751
Image [8] = 0.050876077264547348
Result[9] = 0.30550399422645569
Image [9] = 0.30550399422645569
Result[10] = 0.67474484443664551
Image [10] = 0.67474496364593506
FAILED: Index: 10 - Values: 0.674744964 and: 0.674744844 - Threshold: 1.00000001e-07
Result[11] = 1
Image [11] = 1
Result[12] = -0.050876069813966751
Image [12] = -0.050876077264547348
Result[13] = -0.30550399422645569
Image [13] = -0.30550399422645569
Result[14] = -0.67474484443664551
Image [14] = -0.67474496364593506
FAILED: Index: 14 - Values: -0.674744964 and: -0.674744844 - Threshold: 1.00000001e-07
Result[15] = -1
Image [15] = -1
Result[16] = 0.60382729768753052
Image [16] = 0.60382729768753052
Result[17] = 0.91061854362487793
Image [17] = 0.91061854362487793
Result[18] = 1
Image [18] = 1
Result[19] = 1.6314687728881836
Image [19] = 1.6314687728881836
Result[20] = -0.60382729768753052
Image [20] = -0.60382729768753052
Result[21] = -0.91061854362487793
Image [21] = -0.91061854362487793
Result[22] = -1
Image [22] = -1
Result[23] = -1.6314687728881836
Image [23] = -1.6314687728881836
Result[24] = 1.0114120244979858
Image [24] = 1.0114120244979858
Result[25] = 1.0939645767211914
Image [25] = 1.0939645767211914
Result[26] = 1.8418365716934204
Image [26] = 1.8418369293212891
FAILED: Index: 26 - Values: 1.84183693 and: 1.84183657 - Threshold: 1.00000001e-07
Result[27] = 1
Image [27] = 1
Result[28] = -1.0114120244979858
Image [28] = -1.0114120244979858
Result[29] = -1.0939645767211914
Image [29] = -1.0939645767211914
Result[30] = -1.8418365716934204
Image [30] = -1.8418369293212891
FAILED: Index: 30 - Values: -1.84183693 and: -1.84183657 - Threshold: 1.00000001e-07
Result[31] = -1
Image [31] = -1
Result[32] = -inf
Image [32] = -inf
Result[33] = inf
Image [33] = inf
Result[34] = nan
Image [34] = nan
Result[35] = 0
Image [35] = 0
Note that I modified src/OpenColorIO/ops/gamma/GammaOpCPU.cpp
and tests/cpu/ops/gamma/GammaOpCPU.cpp
to get more outputs.
Modified source
void GammaMoncurveMirrorOpCPUFwd::apply(const void* inImg, void* outImg, long numPixels) const
const float* in = (const float*)inImg;
float* out = (float*)outImg;
const float red[5] = { m_red.scale, m_red.offset, m_red.gamma, m_red.breakPnt, m_red.slope };
const float grn[5] = { m_green.scale, m_green.offset, m_green.gamma, m_green.breakPnt, m_green.slope };
const float blu[5] = { m_blue.scale, m_blue.offset, m_blue.gamma, m_blue.breakPnt, m_blue.slope };
const float alp[5] = { m_alpha.scale, m_alpha.offset, m_alpha.gamma, m_alpha.breakPnt, m_alpha.slope };
for (int i = 0; i < 5; i++)
std::cout << "rgba[" << i << "] = " << red[i] << " " << grn[i] << " " << blu[i] << " " << alp[i] << std::endl;
for (long idx = 0; idx < numPixels; ++idx)
std::cout << "Iteration " << idx << std::endl;
const float sign[4] = { std::copysign(1.0f, in[0]), std::copysign(1.0f, in[1]), std::copysign(1.0f, in[2]),
std::copysign(1.0f, in[3]) };
std::cout << "sign = " << sign[0] << " " << sign[1] << " " << sign[2] << " " << sign[3] << std::endl;
const float pixel[4] = { std::fabs(in[0]), std::fabs(in[1]), std::fabs(in[2]), std::fabs(in[3]) };
std::cout << "pixel = " << pixel[0] << " " << pixel[1] << " " << pixel[2] << " " << pixel[3] << std::endl;
std::cout << "repro: pixel[2]=" << std::hex << *(int*)(&pixel[2]) << ", blu[0] = " << *(int*)(&blu[0])
<< ", blu[1] = " << *(int*)(&blu[1]) << std::endl;
const float intermediate[4] = { pixel[0] * red[0] + red[1], pixel[1] * grn[0] + grn[1], pixel[2] * blu[0] + blu[1],
pixel[3] * alp[0] + alp[1] };
std::cout << "intermediate = " << intermediate[0] << " " << intermediate[1] << " " << intermediate[2] << " "
<< intermediate[3] << std::endl;
const float data[4] = { std::pow(intermediate[0], red[2]), std::pow(intermediate[1], grn[2]),
std::pow(intermediate[2], blu[2]), std::pow(intermediate[3], alp[2]) };
std::cout << "data = " << data[0] << " " << data[1] << " " << data[2] << " " << data[3] << std::endl;
out[0] = sign[0] * (pixel[0] <= red[3] ? pixel[0] * red[4] : data[0]);
out[1] = sign[1] * (pixel[1] <= grn[3] ? pixel[1] * grn[4] : data[1]);
out[2] = sign[2] * (pixel[2] <= blu[3] ? pixel[2] * blu[4] : data[2]);
out[3] = sign[3] * (pixel[3] <= alp[3] ? pixel[3] * alp[4] : data[3]);
std::cout << "out = " << out[0] << " " << out[1] << " " << out[2] << " " << out[3] << std::endl;
in += 4;
out += 4;
void ApplyGamma(const OCIO::OpRcPtr & op,
float * image, const float * result,
long numPixels, unsigned line,
float errorThreshold)
const auto cpu = op->getCPUOp(true);
OCIO_CHECK_NO_THROW_FROM(cpu->apply(image, image, numPixels), line);
for(long idx=0; idx<(numPixels*4); ++idx)
std::cout << "Result[" << idx << "] = " << std::setw(16) << result[idx] << std::endl;
std::cout << "Image [" << idx << "] = " << std::setw(16) << image[idx] << std::endl;
if (OCIO::IsNan(result[idx]))
OCIO_CHECK_ASSERT_FROM(OCIO::IsNan(image[idx]), line);
// Using rel error with a large minExpected value of 1 will transition
// from absolute error for expected values < 1 and
// relative error for values > 1.
const bool equalRel = OCIO::EqualWithSafeRelError(image[idx], result[idx],
errorThreshold, 1.0f);
if (!equalRel)
// As most of the error thresholds are 1e-7f, the output
// value precision should then be bigger than 7 digits
// to highlight small differences.
std::ostringstream message;
message << "Index: " << idx
<< " - Values: " << image[idx]
<< " and: " << result[idx]
<< " - Threshold: " << errorThreshold;
OCIO_CHECK_ASSERT_MESSAGE_FROM(0, message.str(), line);
Let's take a detailed look at the Iteration 2 from the log:
Iteration 2
sign = 1 1 1 1
pixel = 0.25 0.5 0.75 1
repro: pixel[2]=1061158912, blu[0] = 1060559726, blu[1] = 1049774373
intermediate = 0.28909951448440552 0.58333331346511841 0.82142859697341919 1
data = 0.050876077264547348 0.30550399422645569 0.67474496364593506 1
out = 0.050876077264547348 0.30550399422645569 0.67474496364593506 1
On x86_64 with SSE2 disabled, it looks like:
Iteration 2
sign = 1 1 1 1
pixel = 0.25 0.5 0.75 1
repro: pixel[2]=1061158912, blu[0] = 1060559726, blu[1] = 1049774373
intermediate = 0.28909951448440552 0.58333331346511841 0.82142853736877441 1
data = 0.050876077264547348 0.30550399422645569 0.67474484443664551 1
out = 0.050876077264547348 0.30550399422645569 0.67474484443664551 1
The difference of the 3rd element(at index 2) causes a test failure later.
x86_64 : intermediate[2] = 0.82142853736877441
riscv64: intermediate[2] = 0.82142859697341919
After some investigation, I found that the difference is caused by FMA(Fused Multiply-Add).
The following expression
pixel[2] * blu[0] + blu[1]
can be computed with two steps:
fmul.s <a>, <a>, <b>
fadd.s <result>,<a> , <c>
It is equivalent to round(round(a * b) + c)
, which is the behavior on x86_64.
However, with the current build config in this repo, gcc uses fmadd.s
to compute the expr.
fmadd.s <result_register> <a> <b> <c>
computes a * b + c
as round(a * b + c)
, which is of higher precision than round(round(a * b) + c)
. And this difference finally caused the test failure.
There are several ways to fix this and I would like to ask for your opinions.
- Adjust the expected values and error threshold in the test . In my opinion, this is the most correct solution but it will involve a lot of work.
- Add the compiler flag:
to prevent the compiler from generating FMA instructions. Apparently this is not recommended because we will lose the benefits of the higher precision we get for free from FMA.