darktable icon indicating copy to clipboard operation
darktable copied to clipboard

Disable sqrtf vectorization for GCC 15

Open kmilos opened this issue 5 months ago • 11 comments

Alternative (more maintainable?) workaround for https://github.com/darktable-org/darktable/issues/18729

kmilos avatar Jun 21 '25 08:06 kmilos

I fully agree with @ralfbrown

jenshannoschwalm avatar Jun 21 '25 10:06 jenshannoschwalm

Sure. This PR is just to undo the previous workaround and have a better/cleaner starting point. As mentioned in the bug report, someone also needs to check what's hapenning w/ Clang+glibc as it also defines __GNUC__.

kmilos avatar Jun 21 '25 14:06 kmilos

@jenshannoschwalm @ralfbrown @kmilos Are you all ok for having this in 5.2.1 and move to the full fix for 5.4? Any other proposal?

TurboGit avatar Jun 23 '25 07:06 TurboGit

I would prefer @ralfbrown suggestion in #18729 leaving optimizing to whats available.

#if defined(_OPENMP) && !defined(_WIN32) && !defined(__GNUC__)
// GCC has declarations in standard headers if and only if the architecture and compile options support vectorization, so
// don't try to override that here - causes dynamic-link errors at startup.

jenshannoschwalm avatar Jun 23 '25 07:06 jenshannoschwalm

// GCC has declarations ...

Clang also defines __GNUC__

Again, the original bug seems to be glibc specific, not GCC specific. On top of that, there is musl, BSD libc, etc., and all those combine w/ GCC and Clang. I cannot be 100% confident what you propose is the "final" fix, as I don't have access to all of those platforms, nor the time to test all. If you and @ralfbrown are, please feel free to follow up w/ direct code suggestions, commits to a fork of my branch, or a subsequent PR (my preference after merging this one).

kmilos avatar Jun 23 '25 10:06 kmilos

FWIW, these are available in libmvec.so shipped w/ glibc on Fedora 42:

_ZGVdN8vv_atan2f
_ZGVcN8v_sinhf
_ZGVdN4v_atan
_ZGVbN4v_acosf
_ZGVcN8v_erfcf
_ZGVdN8v_sinf
_ZGVdN8vv_powf
_ZGVcN8v_expf
_ZGVeN8v_tan
_ZGVbN2v_exp
_ZGVeN16v_sinf
_ZGVdN4v_log
_ZGVcN8v_tanhf
_ZGVdN8v_log2f
_ZGVbN4v_erff
_ZGVcN8v_log10f
_ZGVbN2vvv_sincos
_ZGVdN8v_log1pf
_ZGVbN2vv_pow
_ZGVcN4vvv_sincos
_ZGVdN4v_asin
_ZGVeN8v_erf
_ZGVbN4v_cosf
_ZGVcN4vv_pow
_ZGVcN4v_atanh
_ZGVeN16v_expm1f
_ZGVeN16v_log2f
_ZGVeN16vvv_sincosf
_ZGVdN8v_erff
_ZGVeN8v_sin
_ZGVcN8v_atanf
_ZGVdN8v_cbrtf
_ZGVeN16v_erff
_ZGVdN8v_log10f
_ZGVdN4v_cos
_ZGVcN4v_expm1
_ZGVbN4v_expf
_ZGVdN8v_exp2f
_ZGVdN8v_cosf
_ZGVeN8vv_pow
_ZGVeN16v_cosf
_ZGVeN8vv_atan2
_ZGVcN4v_exp
_ZGVbN4vv_powf
_ZGVeN16v_cbrtf
_ZGVbN2v_atanh
_ZGVbN2v_log
GLIBC_2.22
_ZGVdN8v_expf
_ZGVbN4v_log2f
_ZGVeN16v_exp2f
_ZGVeN16v_expf
_ZGVbN2v_expm1
_ZGVbN2vv_atan2
_ZGVbN2v_log2
_ZGVeN8v_asinh
_ZGVeN16v_asinhf
_ZGVeN16vv_atan2f
_ZGVcN4v_log2
_ZGVbN4vv_atan2f
_ZGVdN4vv_pow
_ZGVbN4v_cbrtf
_ZGVbN2v_cos
_ZGVeN16v_atanhf
GLIBC_2.35
_ZGVeN8v_log2
_ZGVbN4v_exp2f
_ZGVeN8v_log10
_ZGVcN8vvv_sincosf
_ZGVcN8v_acoshf
_ZGVcN4v_log
_ZGVbN4v_exp10f
_ZGVeN8v_acosh
_ZGVeN16v_log1pf
_ZGVcN4vv_hypot
_ZGVcN4v_asinh
_ZGVcN8v_asinf
_ZGVeN8v_exp10
_ZGVdN8v_acoshf
_ZGVdN4v_log2
_ZGVcN4v_cos
_ZGVeN16v_log10f
_ZGVdN4v_tan
_ZGVdN4vv_hypot
_ZGVbN4v_expm1f
_ZGVbN2v_asinh
_ZGVcN4v_log10
_ZGVcN8v_coshf
_ZGVeN8v_log1p
_ZGVdN4v_erf
_ZGVdN8v_sinhf
_ZGVcN4v_acosh
_ZGVdN8v_erfcf
_ZGVbN2v_tanh
_ZGVcN8v_acosf
_ZGVdN4v_sin
_ZGVcN4v_tanh
_ZGVbN2v_log10
_ZGVdN8v_tanhf
_ZGVcN4v_exp10
_ZGVeN16v_sinhf
_ZGVeN16v_erfcf
_ZGVbN2v_acosh
_ZGVbN2v_tan
_ZGVbN4v_asinhf
_ZGVeN8v_tanh
_ZGVcN8v_logf
_ZGVdN4v_atanh
_ZGVeN16v_tanhf
_ZGVcN8vv_hypotf
_ZGVdN8v_atanf
_ZGVbN2v_erf
_ZGVbN2v_exp10
_ZGVdN4v_expm1
_ZGVbN4v_atanhf
_ZGVcN4v_log1p
_ZGVeN8v_exp
_ZGVdN8vv_hypotf
_ZGVbN4v_sinhf
_ZGVbN4v_erfcf
_ZGVbN2v_sin
_ZGVbN2v_acos
_ZGVeN16v_atanf
_ZGVbN2v_cbrt
_ZGVcN4v_acos
_ZGVbN4v_tanhf
_ZGVcN8vv_powf
_ZGVeN16v_acoshf
_ZGVcN4v_cbrt
_ZGVdN4v_tanh
_ZGVbN4v_logf
_ZGVbN2v_log1p
_ZGVbN4v_log1pf
_ZGVcN8v_log2f
_ZGVcN4v_tan
_ZGVcN8v_tanf
_ZGVeN8v_acos
_ZGVeN8v_cbrt
_ZGVdN8v_logf
_ZGVeN16vv_powf
_ZGVbN4v_atanf
_ZGVeN16v_logf
_ZGVcN4v_erf
_ZGVbN4v_log10f
_ZGVcN8v_exp10f
_ZGVcN8v_cbrtf
_ZGVbN2v_sinh
_ZGVcN4v_sin
_ZGVeN8v_log
_ZGVcN4vv_atan2
_ZGVcN8v_exp2f
_ZGVbN2v_exp2
_ZGVcN4v_sinh
_ZGVcN4v_exp2
_ZGVbN4v_tanf
_ZGVdN4v_acos
_ZGVbN2v_erfc
_ZGVdN8v_exp10f
_ZGVdN4v_cbrt
_ZGVeN8v_sinh
_ZGVdN4v_asinh
_ZGVeN16vv_hypotf
_ZGVdN4vv_atan2
_ZGVeN8vv_hypot
_ZGVcN4v_erfc
_ZGVeN8v_cos
_ZGVdN8v_asinf
_ZGVeN8v_exp2
_ZGVcN8v_expm1f
_ZGVbN4vv_hypotf
_ZGVdN8v_tanf
_ZGVbN2v_cosh
_ZGVdN4vvv_sincos
_ZGVeN16v_tanf
_ZGVbN2vv_hypot
_ZGVcN4v_cosh
_ZGVeN8v_erfc
_ZGVeN16v_asinf
_ZGVdN4v_log10
_ZGVdN8v_coshf
_ZGVbN2v_atan
_ZGVdN8v_expm1f
_ZGVdN4v_sinh
_ZGVeN8v_cosh
_ZGVcN4v_atan
_ZGVdN4v_acosh
_ZGVdN4v_exp2
_ZGVdN8v_acosf
_ZGVeN16v_coshf
_ZGVcN8v_sinf
_ZGVcN8v_asinhf
_ZGVdN8vvv_sincosf
_ZGVbN4v_asinf
_ZGVbN2v_asin
_ZGVdN4v_exp10
_ZGVbN4v_acoshf
_ZGVdN4v_exp
_ZGVeN8v_atan
_ZGVdN4v_erfc
_ZGVeN16v_acosf
_ZGVcN8v_atanhf
_ZGVcN4v_asin
_ZGVeN8vvv_sincos
_ZGVbN4vvv_sincosf
_ZGVdN8v_asinhf
_ZGVdN4v_cosh
_ZGVcN8v_erff
_ZGVeN8v_atanh
_ZGVbN4v_coshf
_ZGVcN8vv_atan2f
_ZGVdN4v_log1p
_ZGVeN8v_asin
_ZGVbN4v_sinf
_ZGVcN8v_cosf
_ZGVeN8v_expm1
_ZGVdN8v_atanhf
_ZGVeN16v_exp10f
_ZGVcN8v_log1pf

fabsf needs to be excluded as well...

kmilos avatar Jun 23 '25 10:06 kmilos

I am absolutely no compiler expert but from what i understood we should trust the compiler using available functions for vectorizing if possible. Not rying to enforce something not being there, otherwise lot's of secific test cases. In other words, gcc should not be treated different from clang. Is this understanding wrong?

jenshannoschwalm avatar Jun 23 '25 11:06 jenshannoschwalm

I am absolutely no compiler expert but from what i understood we should trust the compiler using available functions for vectorizing if possible.

I guess, because it should know what C lib/math lib it's built with?

gcc should not be treated different from clang. Is this understanding wrong?

I suspect they are not able to auto-vectorize everything identically, so that's why I say I'm not 100% sure.

kmilos avatar Jun 23 '25 11:06 kmilos

Not sure either but it's likely the very best guess?

jenshannoschwalm avatar Jun 23 '25 12:06 jenshannoschwalm

it's likely the very best guess?

For me quite the opposite - as we've learned even the same compiler vectorizes differently between its versions, and also depending on (unknown) options..

kmilos avatar Jun 23 '25 13:06 kmilos

For me quite the opposite - as we've learned even the same compiler vectorizes differently between its versions, and also depending on (unknown) options..

Which is precisely why we shouldn't try to force the use of vectorized stdlib functions.

ralfbrown avatar Jun 23 '25 13:06 ralfbrown

Dropping in favour of the alternative approach.

kmilos avatar Jul 18 '25 20:07 kmilos