HMM_RSquareRootF has different precision between SSE and non-SSE
As of this writing, here is what HMM_RSquareRootF looks like:
HMM_INLINE float HMM_RSquareRootF(float Float)
{
float Result;
#ifdef HANDMADE_MATH__USE_SSE
__m128 In = _mm_set_ss(Float);
__m128 Out = _mm_rsqrt_ss(In);
Result = _mm_cvtss_f32(Out);
#else
Result = 1.0f/HMM_SquareRootF(Float);
#endif
return(Result);
}
This means that SSE builds will use an approximation, but non-SSE builds will not.
What do we want to do to fix this? Making the SSE version more precise or making the non-SSE version faster would both be breaking changes. But, I suspect that someone who's deliberately using an inverse square root would expect it to use an approximation.
Id go with making the non-sse rsqrtf faster (use approximation). Generally most people are probably using SSE builds anyways. So the approximation is something they're probably used to.
Typically people are used to SIMD functions having low precision, for example with custom cos or sin functions that expect inputs to be mapped from -pi to pi before calling. So I don’t think it’s too big of a deal.