binaryen icon indicating copy to clipboard operation
binaryen copied to clipboard

Add a prebuilt version for Apple Silicon (M1)

Open d3lm opened this issue 3 years ago • 25 comments

wasm-pack uses wasm-opt and it tries to download the binary for it when it gets executed. However, if you run this on an M1 it gives the following error:

Error: no prebuilt wasm-opt binaries are available for this platform: Unrecognized target!

It would be great if there was a prebuilt version for ARM so wasm-opt runs out of the box on an M1 Mac.

d3lm avatar Nov 16 '21 21:11 d3lm

cc @sbc100

kripken avatar Nov 17 '21 21:11 kripken

@alexcrichton is this something wasm-pack can do or should we try to do it here in the binaryen CI? Do you want to have go at this?

Even if we don't build an arm binary wasm-pack should be able to download and use the x86 version right? That might be the quickest way to get things working.

sbc100 avatar Nov 17 '21 22:11 sbc100

Ah I don't work on wasm-pack any more, so I don't know where this would best be done.

alexcrichton avatar Nov 18 '21 15:11 alexcrichton

@d3lm can you open a bug on wasm-pack and suggest they just the x86-64 binary when running on arm? (for now).

sbc100 avatar Nov 18 '21 15:11 sbc100

@sbc100 Yep I can do that. Thanks.

d3lm avatar Nov 19 '21 14:11 d3lm

But just to understand, the x86-64 does not work out of the box on arm right? You'd have to run this via rosetta no?

d3lm avatar Nov 19 '21 14:11 d3lm

My understanding is the M1 macs do support running x86-64 out of the box (i.e. they ship with whatever tooling they need to make this work).

sbc100 avatar Nov 19 '21 15:11 sbc100

My understanding is that x86-64 binaries can be executed transparently without any special commands and launch sequences.

sbc100 avatar Nov 19 '21 15:11 sbc100

Ah ok, and wasm-pack doesn't have a prebuilt x86-64?

d3lm avatar Nov 19 '21 19:11 d3lm

Binaryen does provide and x86-64 prebuilt of wasm-opt. I'm suggesting the wasm-opt look for this binary rather than the aarch64 (arm64) version (which doesn't not exist).

sbc100 avatar Nov 19 '21 20:11 sbc100

Aha! Got it. Thanks a lot.

d3lm avatar Nov 20 '21 08:11 d3lm

Ok, after some discussion I still think it would be best to have a osx_aarch64 release of binaryen because wasm-pack simply downloads it like this

Tool::WasmOpt => {
  Ok(format!(
    "https://github.com/WebAssembly/binaryen/releases/download/{vers}/binaryen-{vers}-{target}.tar.gz",
     vers = "version_90",
     target = target,
   ))
}

Why can't we add a build for macos-12 too? It's quite common and I think it would be right to ship a release for it as part of binaryen.

d3lm avatar Dec 03 '21 14:12 d3lm

So the binaryen filenames are of the form binaryen-version_100-x86_64-macos.tar.gz so why can't wasm-pack simply do if (target == aarch64-macos) target = x86_64-macos right before trying to download?

Of course, if you want to get the last bit of performance benefit from having a native arm64 build, and you would like to contribute the github action recipe for building it that would be most welcome. But I don't see why you can't use the x86_64 version for now.

As for targeting a certain/newer macos version, what would be point of that? I don't know of any benefit of targeting a more recent version, but maybe I'm missing something? I think we we want to be as compatible as can with all versions (within reason) which means setting a low minimum version when build, right?

sbc100 avatar Dec 03 '21 16:12 sbc100

Because that requires rosetta and it'd be great if we could avoid rosetta and have a native arm64 build. Or am I wrong?

d3lm avatar Dec 03 '21 16:12 d3lm

I see if I can contribute and submit an action for building it for arm64.

d3lm avatar Dec 03 '21 16:12 d3lm

I actually think that building for Apple Sillicon in a GitHub action is not really possible right now, because there's no environment that runs an M1. Tho, you could pass -DCMAKE_OSX_ARCHITECTURES=arm64 to cmake, but I am not sure if that would be enough. Do you have any experience with this?

d3lm avatar Dec 03 '21 17:12 d3lm

Because that requires rosetta and it'd be great if we could avoid rosetta and have a native arm64 build. Or am I wrong?

Why is avoiding rosetta an important goal here? Don't all M1 macs ship with rosetta builtin? I get that it would be nice to have.. but it doesn't seem particularly urgent or important?

Do you have an M1 mac that doesn't have rosetta installed?

Or are you worried about that overhead of running wasm-opt in emulation mode? (is your wasm-opt phase very slow?)

sbc100 avatar Dec 03 '21 19:12 sbc100

Yes, if we wanted to have github build these binaries I imagine it would be cross compile.. i.e. produce an arm64 build on x86_64 hardware.

sbc100 avatar Dec 03 '21 19:12 sbc100

Don't all M1 macs ship with rosetta builtin?

I think so yes.

Do you have an M1 mac that doesn't have rosetta installed?

No, because I think it's installed by default.

Or are you worried about that overhead of running wasm-opt in emulation mode?

Yes, but that may be unjustified worries, because I have not yet seen any performance issues.

For the time being, I submitted a PR to wasm-pack https://github.com/rustwasm/wasm-pack/pull/1088 that downloads the x86_64 build for aarch64. If issues pop up we can re-evaluate or try to set up a native arm64 build on x86_64 hardware.

d3lm avatar Dec 03 '21 19:12 d3lm

Great! That sounds a good place to be for now.

sbc100 avatar Dec 03 '21 20:12 sbc100

Another option here is to make a wasm build of wasm-opt. With node-pthreads support + exceptions support that should run pretty fast in recent node. Not as fast as a native M1 build I'm sure, but it might be faster than an x86_64 build emulated on M1... which makes this potentially interesting.

kripken avatar Dec 03 '21 21:12 kripken

@kripken I actually had the same idea! That would work too which would eliminate the interoperability issues across different systems.

d3lm avatar Dec 03 '21 21:12 d3lm

I did a little testing of that now using this diff:

diff --git a/CMakeLists.txt b/CMakeLists.txt
index e540b1f57..0dc9c204b 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -265,26 +265,32 @@ else()
     add_nondebug_compile_flag("-UNDEBUG")
   endif()
 endif()
 
 if(EMSCRIPTEN)
   # link with -O3 for metadce and other powerful optimizations. note that we
   # must use add_link_options so that this appears after CMake's default -O2
   add_link_options("-O3")
   add_link_flag("-s SINGLE_FILE")
   add_link_flag("-s ALLOW_MEMORY_GROWTH=1")
-  add_compile_flag("-s DISABLE_EXCEPTION_CATCHING=0")
-  add_link_flag("-s DISABLE_EXCEPTION_CATCHING=0")
+  add_compile_flag("-fwasm-exceptions")
+  add_link_flag("-fwasm-exceptions")
+  add_compile_flag("-pthread")
+  add_link_flag("-pthread")
+  add_link_flag("-s PROXY_TO_PTHREAD")
+  add_link_flag("-s EXIT_RUNTIME")
+  add_link_flag("-Wno-pthreads-mem-growth")
   # make the tools immediately usable on Node.js
   add_link_flag("-s NODERAWFS")
   # in opt builds, LTO helps so much (>20%) it's worth slow compile times
-  add_nondebug_compile_flag("-flto")
+  #add_nondebug_compile_flag("-flto")
 endif()
 
 # clang doesn't print colored diagnostics when invoked from Ninja
 if(UNIX AND CMAKE_GENERATOR STREQUAL "Ninja")
   if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
     add_compile_flag("-fdiagnostics-color=always")
   elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
     add_compile_flag("-fcolor-diagnostics")
   endif()
 endif()
diff --git a/src/support/threads.cpp b/src/support/threads.cpp
index ab9de4175..7aec720ba 100644
--- a/src/support/threads.cpp
+++ b/src/support/threads.cpp
@@ -132,24 +132,25 @@ void ThreadPool::initialize(size_t num) {
       threads.clear();
       return;
     }
   }
   DEBUG_POOL("initialize() waiting\n");
   condition.wait(lock, [this]() { return areThreadsReady(); });
   DEBUG_POOL("initialize() is done\n");
 }
 
 size_t ThreadPool::getNumCores() {
-#ifdef __EMSCRIPTEN__
+#if defined(__EMSCRIPTEN__) && !defined(__EMSCRIPTEN_PTHREADS__)
   return 1;
 #else
   size_t num = std::max(1U, std::thread::hardware_concurrency());
   if (getenv("BINARYEN_CORES")) {
     num = std::stoi(getenv("BINARYEN_CORES"));
   }
   return num;
 #endif
 }
 
 ThreadPool* ThreadPool::get() {
   DEBUG_POOL("::get()\n");
   // lock on the creation

Everything works as expected when I optimize some large files, but it is surprisingly slow. It is using multiple cores, and in fact uses them more than a native build. Native has this:

66.95user 0.67system 0:14.33elapsed 471%CPU (0avgtext+0avgdata 608316maxresident)k

and node 16.5 has this:

768.36user 237.38system 2:37.22elapsed 639%CPU (0avgtext+0avgdata 762028maxresident)k

I'm not sure what's going wrong here. Perhaps the slower wasm atomics are hurting us since they are all sequentially consistent...? Anyhow, sadly using wasm isn't a easy solution for this issue.

kripken avatar Dec 07 '21 21:12 kripken

Thanks so much for investigating a WASM version of wasm-opt @kripken. Appreciate it a lot!

It's a bummer that it's so much slower 😞 but I can also imagine that Atomics are slowing it down a lot. I can't imagine that using pthreads (which is mapped to web workers or worker threads) would be a problem here.

I'd love to loop @tschneidereit in as well. Maybe you have an idea why it would be significantly slower than native?

I suppose we have to wait for an M1 environment for GitHub actions or build an aarch64 with a cross compiler. WDYT @kripken?

d3lm avatar Dec 08 '21 12:12 d3lm

If we have a recent enough SDK on our existing mac builder, It should be very straightforward to cross-compile for aarch64.

dschuff avatar Dec 16 '21 00:12 dschuff