habitat-sim
habitat-sim copied to clipboard
[Web] Add flag to enable SIMD instructions in WASM
Motivation and Context
- Adds a flag that can enable building the Web build of Habitat to use SIMD. This should speed up physics and other linalg heavy vectorization hopefully ~quite significantly~. The browser must support SIMD in WASM like modern versions of Chrome though.
- To enable pass the --simd flag to build_js.sh
- Also fixes a small bug with pre-commit that prevented shell check from running on build.sh and build_js.sh .
How Has This Been Tested
- Locally and with CI.
Types of changes
- [X] Docs change / refactoring / dependency upgrade
- [X] New feature (non-breaking change which adds functionality)
Checklist
- [X] My code follows the code style of this project.
- [X] I have read the CONTRIBUTING document.
- [X] I have completed my CLA (see CONTRIBUTING)
- [X] I have added tests to cover my changes.
- [X] All new and existing tests passed.
quite significantly
(With my skeptical hat on.) Do you have some numbers to back this? I'm interested in how much this helps in a codebase of this size.
@mosra Probably not on Magnum, but WASM doesn't even enable SSE2 instructions without this flag for Bullet so it should give a speed up there.
Regardless, point taken retracting my claim slightly.
It's not as simple as "enabling SSE2" since WASM has to work on ARM as well -- and that's why I'm skeptical, because different platforms have different instructions and what could directly map to a SSE instruction might have to be emulated on NEON and vice versa.
But in any case I really want to know how this helps, seriously :) Did you try it out? I know from certain projects that hand-coded WASM SIMD can be four to six times times faster than scalar code, but have no idea about autovectorization, especially when combined with everything else we're running here. Is it 1%? 10%? 2x faster?
I tried this on my webxr hand demo benchmark, which drops a lot of objects in a big pile and tries to step physics 60 times per second (or 16.7 ms per stepWorld() call). I repeated the benchmark 3 times with and without the --simd flag and here were the results:
without --simd: 78.76ms 72.60ms 82.87ms
with --simd: 79.75ms 84.07ms 83.48ms
These numbers are the average ms between stepWorld() calls. Note that it's trying to achieve 16.67ms per stepWorld() call but cannot keep up. So it doesn't seem like this SIMD optimization has helped much in this case.