graphene
graphene copied to clipboard
Attempt to consolidate SSE and ARM NEON SIMD code for GCC/clang and Visual Studio
Hi,
This attempts to clean up the code a bit in graphene-simd4f.h
and graphene-simd4x4f.h
by trying to reduce the code duplication for SSE and ARM NEON SIMD implementation due to syntactical differences in Visual Studio and GCC/clang in regards to inlining, via:
- Defining macros that deals with the inlining method (
__extension__
and direct intrinsic call) supported by GCC/clang and Visual Studio, for calls that could be done as one-liners. - In a similar fashion, define initializing
graphene_simd4f_t
arrays from the 4 floats that we pass in, especially as we required C99 support for a while and the supported Visual Studio compilers have the needed support for this. - Remove unneeded repetitions in the code.
- Prefix the MSVC-specific implementations with
graphene_msvc_
instead of just_
to make things clearer to people[1].
[1]: Sadly, I was not able to do the cleanup for the SIMD code that are done in a function-like manner. I couldn't get the preprocessor happy in one shot for Visual Studio and clang, ugh :|, so I had to leave that alone, since preprocessors don't allow a working #define inside a macro and doesn't like splitting lines when set apart by #if/#ifdef's. So this is the best I could do for now. For instance:
(unrelated parts omitted for brevity, trying to remember things on top of my head, so there might be some mistakes below)
(graphene-macros.h)
#if defined (__GNUC__) || defined (__clang__)
...
#define GRAPHENE_FUNCCALL_2ARG_MACRO(ftype,fname,v0,v1) \
(__extension({
#define GRAPHENE_FUNCCALL_2ARG_BEGIN(rtype,ftype,fname,t0,v0,t1,v1)
#define GRAPHENE_FUNCCALL_BODY(expr) expr;
#define GRAPHENE_FUNCCALL_RETURN(rtype,rvalue) (rtype) rvalue;
#define GRAPHENE_FUNCCALL_END \
}))
#elif defined (_MSC_VER)
...
#define GRAPHENE_FUNCCALL_2ARG_MACRO(ftype,fname,v0,v1) \
graphene_msvc_##ftype##_##fname## (v0, v1)
#define GRAPHENE_FUNCCALL_2ARG_BEGIN(rtype,ftype,fname,t0,v0,t1,v1) \
static inline rtype \
graphene_msvc_##ftype##_##fname## (t0 v0, t1 v1) \
{
#define GRAPHENE_FUNCCALL_BODY(expr) expr;
#define GRAPHENE_FUNCCALL_RETURN(rtype,rvalue) return rvalue;
#define GRAPHENE_FUNCCALL_END \
}
#else
...
(graphene-simd4f.h)
...
# define graphene_simd4f_get(s,i) \
GRAPHENE_FUNCCALL_2ARG_MACRO (simd4f, get,s ,i) \ /* for this line, it's either with the trailing backslash for GCC/clang or without it for MSVC :(, otherwise other lines here all work */
GRAPHENE_FUNCCALL_2ARG_BEGIN (float, simd4f, get, graphene_simd4f_t, int, s, i) \
GRAPHENE_FUNCCALL_BODY (graphene_simd4f_union_t __u = { (s) }) \
GRAPHENE_FUNCCALL_RETURN (float, __u.f[(i)]) \
GRAPHENE_FUNCCALL_END
...
I understand that this PR might well conflict with the changes in #251, so if one of this or #251 goes through, I will fix things up as needed as soon as possible.
With blessings, thank you!