emscripten icon indicating copy to clipboard operation
emscripten copied to clipboard

emsdk/upstream/emscripten/cache/sysroot/include/compat/emmintrin.h is not lib/clang/19/include/**/emmintrin.h

Open gblikas opened this issue 1 year ago • 9 comments
trafficstars

I'm looking to narrow down a build issue. I am compiling opencv4 via

vcpkg install opencv4:wasm32-emscripten --editable

and have modified a bad header to #include <emmintrin.h> as it is a missed dependency. I did this because the documentation for Using SIMD with WebAssembly indicates that _mm_setr_epi64 should be available via #include <emmintrin.h>. Judging by my error's output:

/Users/gblikas/projects/vcpkg/buildtrees/opencv4/src/4.8.0-2bf495557d/modules/core/include/opencv2/core/hal/intrin_sse.hpp:252:15: error: use of undeclared identifier '_mm_setr_epi64'; did you mean '_mm_set_epi64'?
  252 |         val = _mm_setr_epi64((__m64)v0, (__m64)v1);
      |               ^~~~~~~~~~~~~~
      |               _mm_set_epi64
/Users/gblikas/projects/emsdk/upstream/emscripten/cache/sysroot/include/compat/emmintrin.h:1133:1: note: '_mm_set_epi64' declared here
 1133 | _mm_set_epi64(long long q1, long long q0)
      | ^

and

$ cat /Users/gblikas/projects/emsdk/upstream/emscripten/cache/sysroot/include/compat/emmintrin.h  | grep "_setr"
_mm_setr_pd(double __c0, double __c1)
_mm_setr_epi32(int i0, int i1, int i2, int i3)
_mm_setr_epi16(short w0, short w1, short w2, short w3, short w4, short w5, short w6, short w7)
_mm_setr_epi8(char b0, char b1, char b2, char b3, char b4, char b5, char b6, char b7, char b8, char b9, char b10, char b11, char b12, char b13, char b14, char b15)

it appears as if the included header emmintrin.h from **/upstream/emscripten/cache/sysroot/include/compat/emmintrin.h is not the same as any of the lib/clang/19/include/**/emmintrin.h headers.

I'm not sure what emscripten/cache/sysroot is, so this issue is most-likley user error. However,

$ grep -lR "_mm_setr_epi64" emsdk
emsdk/upstream/emscripten/test/sse/test_sse2.cpp
emsdk/upstream/emscripten/test/sse/benchmark_sse2.cpp
emsdk/upstream/lib/clang/19/include/emmintrin.h
emsdk/upstream/lib/clang/19/include/ppc_wrappers/emmintrin.h

does show that _mm_setr_epi64 should be found in emsdk/upstream/lib/clang/19/include/**/emmintrin.h. Is there a known work around for this?ng the

If this issue is better suited for the opencv repository, or vcpkg, I'll move the issue over there. However I believe this is an #include issue with emscripten-core.


Potentially Related issue

  • https://github.com/emscripten-core/emscripten/issues/9353

Version Info

$ emcc -v 
emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.62 (34c1aa36052b1882058f22aa1916437ba0872690)
clang version 19.0.0git (https:/github.com/llvm/llvm-project c00ada070207979f092be9046a02fcfff8b9f9ce)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /Users/gblikas/projects/emsdk/upstream/bin

gblikas avatar Jul 12 '24 00:07 gblikas

The compat/emmintrin.h in the emscripten sysroot looks like it uses wasm intrinsics which is likely better than the clang builtin one that is using __builtin_ia32 intrinsics.

I'm guess we simply have not updated it in a while and we need to add the missing _mm_setr_epi64 declaration.

@tlively do you know about the history of these two files?

sbc100 avatar Jul 12 '24 00:07 sbc100

It looks like the old version of this header (which was removed in #7924) did contain this function but when it was re-introduced in #11243 that function was not included.

CC @juj

sbc100 avatar Jul 12 '24 00:07 sbc100

The compat/emmintrin.h in the emscripten sysroot looks like it uses wasm intrinsics which is likely better than the clang builtin one that is using __builtin_ia32 intrinsics.

I'm guess we simply have not updated it in a while and we need to add the missing _mm_setr_epi64 declaration.

@tlively do you know about the history of these two files?

@sbc100 Thanks for the prompt response. If it's not too involved, I can push a PR.

gblikas avatar Jul 12 '24 01:07 gblikas

It looks like the old version of this header (which was removed in #7924) did contain this function but when it was re-introduced in #11243 that function was not included.

CC @juj

@sbc100 So, I got a working solution. I'm now formalizing the patch in emscripten-core, and will post a PR we can edit/tune in, in a moment.

Additionally, I'm looking at the definitions, https://github.com/emscripten-core/emscripten/blob/c8904662caf1c7e43e29f5881f6ce615360f9c68/system/include/compat/emmintrin.h#L1180-L1196 and wondering why they are preferring builtin types: char, short, int, long, long long. It seems to be that it's better to transition them to their int8_t, int16_t, int32_t, int64_t, and __m64 specific types, in order to provide more portable and readable code. @tlively or @juj do you have any suggestions on this?

I understand that wasm_simd128.h typedef's the builtin types, and perhaps the reason for leaving

static __inline__ __m128i __attribute__((__always_inline__, __nodebug__))
_mm_setr_epi64(__m64 q0, __m64 q1)

out was the fact that wasm_simd128.h doesn't actually contain a direct typedef for __m64?

gblikas avatar Jul 12 '24 22:07 gblikas

I would be fine with using the intX_t types.

__m64 is apparently defined in xmmintrin.h, so if it's missing, that would be the place to add it, not wasm_simd128.h.

tlively avatar Jul 13 '24 23:07 tlively

@tlively or @sbc100 do you have an idea why the conditional in https://github.com/emscripten-core/emscripten/pull/7924 (was the PR it was removed):

#ifndef __EMSCRIPTEN__ // MMX support is not available in Emscripten/SIMD.js.
static __inline__ __m128i __attribute__((__always_inline__, __nodebug__))
_mm_setr_epi64(__m64 q0, __m64 q1)
{
  return (__m128i){ (long long)q0, (long long)q1 };
}
#endif

was based on the __EMSCRIPTEN__ macro instead of something MMX/SEE specific; what's the history here wrt the comment?

gblikas avatar Jul 14 '24 19:07 gblikas

These headers are only ever included in Emscripten builds, so #ifndef __EMSCRIPTEN__ is functionally equivalent to #if 0. I guess this function was removed from the build because it could not be efficiently supported on top of SIMD.js back in the day, but I don't know why exactly.

tlively avatar Jul 15 '24 17:07 tlively

@tlively https://github.com/emscripten-core/emscripten/pull/22243 should be good then.

gblikas avatar Jul 25 '24 23:07 gblikas

It looks like the old version of this header (which was removed in #7924) did contain this function but when it was re-introduced in #11243 that function was not included.

CC @juj

Originally when I wrote 128-bit SSE,SSE2, ... support, I did not add any of the old legacy 64-bit MMX support, which is considered obsolete: https://www.phoronix.com/news/LLVM-Clang-MMX-Intrinsics-SSE2 and https://groups.google.com/g/llvm-dev/c/7FEDyVe2ipQ/m/PY4F6hElAgAJ?pli=1

The reason was that Wasm SIMD128 did not support any 64-bit intrinsics.

Note that there is a little bit of overlap with 128-bit SSE* and 64-bit MMX, as the SSE* instruction sets also added (for completeness) some 64-bit MMX registers instructions in them: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#ig_expand=5740&techs=SSE_ALL&text=__m64

so as result, not "faking" 64-bit intrinsics resulted in those SSE/SSE2/SSE3 functions being dropped out that had a __m64 in their signature.

I don't oppose adding support to 64-bit MMX functions, emulated by 128-bit Wasm SIMD128. That is the path that LLVM is taking as well, as mentioned in the above phoronix link.

juj avatar Aug 01 '24 11:08 juj

@gblikas I think you might be taking the wrong approach with all this. OpenCV has support for WebAssembly SIMD and does not require SSE or SSE2 mappings/emulation.

The problem is that when building OpenCV for Emscripten, CV_CPU_COMPILE_SSE2 is defined by default via the CMake logic which results in the incorrect header files being included.

The solution to your problem is to disable SSE by passing the following options to CMake:

-DCV_ENABLE_INTRINSICS=ON -DCPU_BASELINE= -DCPU_DISPATCH=

Hope this helps!

allsey87 avatar Aug 27 '24 14:08 allsey87

For anyone working close on OpenCV, it would actually be amazing to read a writeup of how the SSE path vs handwritten Wasm SIMD path vs a ARM Neon path compilation in Emscripten works out with performance. That is, since Emscripten supports all of those, users can choose which SIMD backend to target.

It would be a really nice investigation to see "here's how much of a performance difference there is if you use Wasm SIMD instead of SSE" in different OpenCV tasks.

juj avatar Aug 27 '24 14:08 juj

That is, since Emscripten supports all of those

Well, the SSE path for OpenCV is broken at the moment with some necessary functions missing (see https://github.com/emscripten-core/emscripten/pull/22243). I haven't tried the NEON path yet.

allsey87 avatar Aug 27 '24 14:08 allsey87

@gblikas I think you might be taking the wrong approach with all this. OpenCV has support for WebAssembly SIMD and does not require SSE or SSE2 mappings/emulation.

The problem is that when building OpenCV for Emscripten, CV_CPU_COMPILE_SSE2 is defined by default via the CMake logic which results in the incorrect header files being included.

The solution to your problem is to disable SSE by passing the following options to CMake:

-DCV_ENABLE_INTRINSICS=ON -DCPU_BASELINE= -DCPU_DISPATCH=

Hope this helps!

@allsey87 Totally right. Sorry for the long delay; after seeing your comment, I went through and modified the opencv4 portfile to support building Emscripten. In doing so, many of the flags don't work or result in broken builds related to what you mention:

-DCV_ENABLE_INTRINSICS=ON -DCPU_BASELINE= -DCPU_DISPATCH=

Also, there are some threading issues once you do build it - threads is pretty much broken.

gblikas avatar Oct 08 '24 20:10 gblikas

For anyone working close on OpenCV, it would actually be amazing to read a writeup of how the SSE path vs handwritten Wasm SIMD path vs a ARM Neon path compilation in Emscripten works out with performance. That is, since Emscripten supports all of those, users can choose which SIMD backend to target.

It would be a really nice investigation to see "here's how much of a performance difference there is if you use Wasm SIMD instead of SSE" in different OpenCV tasks.

@juj I'm not close to the project, but am taking a stab at it. I've just finished my internal deployment, and am looking to do comparative analysis. Might be slow, but I'll get there.

gblikas avatar Oct 08 '24 20:10 gblikas

That is, since Emscripten supports all of those

Well, the SSE path for OpenCV is broken at the moment with some necessary functions missing (see #22243). I haven't tried the NEON path yet.

@allsey87 I opended #22243 which might be irrelevant now... I also haven't heard anything back from OpenCV on how they actually use Emscripten internally... Their documentation says one thing, then the github runners another, then the internal Intel whatever you call it says another. See https://github.com/opencv/opencv/issues/25956.

gblikas avatar Oct 08 '24 20:10 gblikas

I'm going to close this comment until I hear back from the opencv team and finalize my own internal testing against Emscripten 2.0.13 (documented OpenCV deps) and Emscripten latest (what OpenCV looks to be running?...).

gblikas avatar Oct 08 '24 20:10 gblikas