Adam Stylinski comments

Results 209 comments of


                                            Adam Stylinski

Make use of NEON alignment hints

One thing I'm sure of is that GCC and clang ignore the alignment hints, at least when compiling on Linux.

SSE2 support not enabled for Windows x64 builds, causing performance discrepancy

> You have several differences in flags there. > > I don't think -DWITH_NATIVE_INSTRUCTIONS=ON does what you think it does. It is the same as -mtune=native, so it binds the...

SSE2 support not enabled for Windows x64 builds, causing performance discrepancy

Another thing it could also be is a bug in MinGW inducing the AVX->SSE transition penalty a lot. I think I've seen that before, too.

SSE2 support not enabled for Windows x64 builds, causing performance discrepancy

If it is the transition penalty, a new version of the minGW might have fixed it. If you're seeing that a lot, perf has a way to count how often...

SSE2 support not enabled for Windows x64 builds, causing performance discrepancy

Did you have a profiler available to maybe see which functions are on the hot path? I assume you built without the native option too, right? I don't think native...

SSE2 support not enabled for Windows x64 builds, causing performance discrepancy

The calling Delphi is unlikely to be the issue but it wouldn't surprise me if libspng somehow was slower when compiled with 64 bit. I don't know anything about the...

SSE2 support not enabled for Windows x64 builds, causing performance discrepancy

Looking at their source code, it uses C intrinsics so it's likely compiling the SIMD vectorized compression filters. However, it is compiling explicitly with SSE. GCC and MSVC should be...

SSE2 support not enabled for Windows x64 builds, causing performance discrepancy

_Another_ thing you could also try is compiling libspng with -mavx2. This will still compile all the SSE intrinsics but it will use vex encoding, eliminating the need to call...

SSE2 support not enabled for Windows x64 builds, causing performance discrepancy

That's fairly normal, x86 is used for both 32 bit and 64 bit code. It seems likely that the code is dispatching to the SSE* accelerated functions. Are you sure...

SSE2 support not enabled for Windows x64 builds, causing performance discrepancy

It's a bit difficult without visibility into the running process. Profiler tools of some sort would help immensely, here. In particular it'd tell us right off the bat which functions...