ndk
ndk copied to clipboard
[FR] WASM as a portable IR for apps
Description
I wanted to share some work that we've been investigating for a while, and hopefully gather some feedback. This is all very preliminary (we don't even have a JNI hello world running yet), but I always prefer to share our plans as early as possible :)
As you all know, apps have to run on a wide range of devices. Some of those devices will remain in the ecosystem for a very long time, and that means apps need to continue working on old SoCs during that time. That support window is even longer because some of those old SoCs will continue being used in new devices for quite some time. Because of those two factors, apps often need to maintain support for each SoC for around a decade.
Over the last 15 years we've added a few new ABIs -- either for completely new architectures like arm64, or architecture evolutions such as armv5 to armv7 to armv7 with neon -- and each one took a lot of work from both OS and app developers. Adding 64-bit ABIs doubled the amount of testing work for apps (and the APK size, since this predated app bundles, and split APKs were cumbersome), and because of that many apps did not support 64-bit ABIs for quite some time. With 32-bit Arm, Android's original ABI was armv5, and even after we added armv7, most apps did not ship the newer ABI (despite having huge performance gains) until the NDK itself stopped supporting armv5. The armv7 ABI took nearly a decade to include support for Neon because of one old SoC.
The Play Store targets devices along ABI lines, so taking advantage of new architecture features (such as new instructions within arm64) requires app developers to reimplement library selection on their own, and bloating the app bundle (or worse, APK) with libraries for each ABI variant they want to support. Previously, we solved this problem by adding new ABIs (as in the arm32 example above), but because those are so disruptive, those improvements are usually obsolete by the time they're worth it for apps.
Apps would benefit from a new 64-bit Arm baseline today, but testing is one of the largest costs for app developers, and every new variant (because they're inherently backwards-incompatible) increases the test matrix. Developers have to choose between better use of new hardware or lowering their costs, and the latter is almost always the better choice.
Compiling NDK code to a portable IR that can be lazily compiled for the device it will run on would allow those of us on the app side of this problem to build and test our code once, and leave the architecture-specific targeting to someone else.
We are not talking about WASM sandboxing, just the portable IR. Sandboxing is something we would consider offering as an option later on, but a behavior- and performance-preserving portable IR is the priority here.
Why?
This would let us do some pretty cool things. Right now, NDK code is compiled for the lowest common denominator SoC for each ABI, and that minimum is only very rarely raised (remember, even an "evolution" of an ABI requires a new ABI). If WASM is uploaded to Play rather than native binaries, Play can target the exact SoC before delivering to the device, which should improve performance for workloads that are able to take advantage of those optional pieces of hardware (this should be most pronounced on numerically heavy code like vector math).
Having the app's full WASM rather than ELF binaries in Play would also mean that Play could automatically apply LTO (and PGO?) for the entire app.
Goals
- Reduce the cost of support for evolving or adding Android ABIs.
- Compile WASM to native libraries in the cloud to save device resources. Play has the WASM code and it knows what device it's delivering to, so it can pre-build the app for that device, and cache that result for other devices with the same SoC.
- Continue supporting local compilation. If nothing else, you'll need this during development, so it will always be there 🙂 This also will be needed for you to ship your apps to destinations other than Play. These will need to be the same tools used by Play to ensure bug reproducibility.
- Avoid bothering app developers with the details of this migration. If you're using one of our supported build systems (ndk-build or our CMake toolchain file), typical apps will not need to do anything to take advantage of this. If you're using another build system, the maintainer of that build system will need to do the work to support this.
- Allow LLVM to compile native code for each device's specific SoC to take advantage of available hardware rather than compiling for the lowest common denominator of the ABI.
- Enable full app LTO and PGO automatically via Play.
- Ease testing for the NDK in the long term. The NDK wouldn't need to test every backend, since getting each backend right would be the wasm compiler's job. Only needing to test one ABI would let the NDK rely solely on cloud devices and emulators for testing.
Non-goals
- Sandboxing. Optional sandboxing might be a thing we can investigate in the future as a result of this work, but it is counter to the goal of a portable IR for our existing use cases. Avoiding sandboxing should make this transition seamless for most apps.
- Migration of existing binaries. If your app has an unfortunate dependency on a binary that no one can rebuild (😱 , but it happens), this won't solve that problem. This won't make it any worse either though, since those already wouldn't work on new ABIs.
- Portability between 32-bit and 64-bit. WASM does have an IR for each, so you'll need to build and upload both. You're probably already doing that for the existing architectures today though, so that won't be any new work.
Concerns
There are some pretty large concerns that we need to address here:
- Will it perform adequately? Performance critical code is one of the NDK's major use cases, so we need to win back any performance losses incurred by the extra level of translation in other ways. We're hoping the potential performance gains mentioned above will mitigate (and perhaps even exceed) this.
- Are JITs affected? Not necessarily, because those apps can continue building their own JITs to target new ABIs as they always have. We could potentially lower the cost for apps with JITs though by providing a library that does WASM -> native JITing, and then apps would only need to maintain their WASM backend, rather than every Android ABI.
- On-device compilation for apps that were not delivered by the Play store. We need to support other app stores and things like apkmirror, even if they do not pre-compile the app like Play would. A probable solution would be a service on the device that could do the WASM -> native compilation. A bit tricky, because we need to solve this problem for devices as old as Lollipop, so the solution needs to exist outside the OS.
- On-device compilation for apps transferred between phones via SD card. Those compiled apps might use instructions not available on the new phone. We already have this problem today, but it would become much more common if we're targeting exact SoCs instead of a lowest common denominator. This is possibly solvable by including the WASM in the app bundles and APKs, and then using the same on-device compilation method discussed in 3.
Please let us know if you have questions/objections/suggestions, but remember that "I'm worried about performance" is something we're already well aware of :) We won't be moving forward with this if we can't address the performance concerns.
Timeline
Ha, not yet. It's still too early to even guess :) Definitely no need to worry about any migration soon (FWIW, we plan on this being transparent for anyone using our official build systems anyway). We would like to start landing prototypes in our master branch as soon as we can if anyone is interested in trying this, but even that is probably several months away in the best case.
Phenomenal!
Are there any ideas on how FFI between wasm and native functions will work? AFAIK there are significant impedance mismatches between the two (e.g. wasm only has 32/64-bit integral types)
Apps would benefit from a new 64-bit Arm baseline today, but testing is one of the largest costs for app developers, and every new variant (because they're inherently backwards-incompatible) increases the test matrix. Developers have to choose between better use of new hardware or lowering their costs, and the latter is almost always the better choice.
Doesn't this make the test matrix for app developers worse? You go from having to worry about a new NDK learning how to optimize out your undefined behavior to having to worry about every single chipset getting different compiler output.
Portability between 32-bit and 64-bit. WASM does have an IR for each, so you'll need to build and upload both. You're probably already doing that for the existing architectures today though, so that won't be any new work.
I'm a bit surprised that you're not just dropping 32-bit entirely.
Continue supporting local compilation. If nothing else, you'll need this during development, so it will always be there slightly_smiling_face This also will be needed for you to ship your apps to destinations other than Play. These will need to be the same tools used by Play to ensure bug reproducibility. Ease testing for the NDK in the long term. The NDK wouldn't need to test every backend, since getting each backend right would be the wasm compiler's job. Only needing to test one ABI would let the NDK rely solely on cloud devices and emulators for testing.
These two seem in conflict (unless you mean from the standpoint of "it's someone else's problem" :-)
Are there any ideas on how FFI between wasm and native functions will work? AFAIK there are significant impedance mismatches between the two (e.g. wasm only has 32/64-bit integral types)
This is actually where @matthias-blume and @dimitry- are spending their time atm. Making sure this is as cheap as possible is pretty essential (think graphics code that needs to frequently call into libvulkan.so). It sounds like it's quite doable. I can't answer the bit about integer types, but one of them probably can.
Doesn't this make the test matrix for app developers worse? You go from having to worry about a new NDK learning how to optimize out your undefined behavior to having to worry about every single chipset getting different compiler output.
Kind of. If you test your WASM app on one phone and it's miscompiled for another, that's our bug, not yours. If your testing goal is to defend against device-specific bugs, you're already doing that, because CTS is far from perfect.
Perhaps I'm being overly optimistic, but in the long run I expect the NDK's own testing (that is, the bits that test CMake, ndk-build, LLVM, etc) to be able to drop the ABI dimension from the test matrix and just delegate that testing to the WASM compiler.
These two seem in conflict (unless you mean from the standpoint of "it's someone else's problem" :-)
The WASM -> native compiler would have all that testing. The NDK itself (the integration point for a lot of other tools) would not need to re-test that behavior. Very similar to how Clang is currently handled. We don't need to run a full C/C++ compiler test suite because clang already did that. It doesn't remove the testing burden, but pushes it upstream, where it can be found and fixed more easily anyway.
I definitely like to see an hello world example of this because I still cannot grasp the concept
Does this mean a supposedly single device-agnostic libfoo.wasm shipped in a traditional APK format can be recompiled to native libfoo.so either by Play Store or on-device in the future?
Thanks for sharing!
Why you decided to choose WASM instead of LLVM IR? Why this additional translation step is necessary (Native code -> LLVM IR -> WASM -> LLVM IR -> Binary), it looks like we will just lose some metadata during this process. And LLVM IR is pretty stable during minor updates.
You are not planing to eventually deprecate native binaries and force all developers to use this new distribution method right?
Why you decided to choose WASM instead of LLVM IR? Why this additional translation step is necessary (Native code -> LLVM IR -> WASM -> LLVM IR -> Binary), it looks like we will just lose some metadata during this process. And LLVM IR is pretty stable during minor updates.
LLVM IR is not even intended to be stable across releases, nor is it machine-agnostic, so that seems to break most of the fundamental goals of having a portable IR. It also would not provide the ability for developers to target older releases (it's only forwards compatible).
Does this mean a supposedly single device-agnostic libfoo.wasm shipped in a traditional APK format can be recompiled to native libfoo.so either by Play Store or on-device in the future?
Precisely.
You are not planing to eventually deprecate native binaries and force all developers to use this new distribution method right?
No plans, though (assuming this works to our bar of "no regressions") I'm curious what problems you think that would cause. If it's portable to any device and maintains (or improves!) performance, why would you want to opt out of that?
LLVM IR is not even intended to be stable across releases, nor is it machine-agnostic
In case you want an example, Renderscript made this mistake, and the price it paid was maintaining its own support for an ancient LLVM IR version throughout its lifetime.
Wasn't this one of the original ideas behind Java? Don't ship native binaries, instead invent a bytecode that gets recompiled to native code on the box (either JIT or at install time). Has it already been long enough everybody forgot and it's back as a new idea? (It's still superior to shipping source and compiling it on the box because intellectual property and build dependencies, just as the argument went in 1996 shortly before javascript started displacing java applets, correct?)
Shrug. Good luck...
Wasn't this one of the original ideas behind Java?
wasm is actually much closer to CLR than Java bytecode... there are no objects or classes, just a flat chunk of storage and cpu-like loads and stores. see https://webassembly.github.io/spec/core/binary/instructions.html for details.
in that sense it is more like LLVM IR mentioned above, except without the fatal flaw of LLVM IR's lack of stability. (aiui wasm was originally intended to add object stuff on top, but it turned out that there was actually a lot of pent-up demand for something way simpler than that. you can recognize this heritage from the way that the oddest parts of wasm are all about bytecode verification, because sandboxing -- which wouldn't be relevant for us -- was a fundamental goal of what they were aiming at.)
another better comparison for wasm is with https://developer.chrome.com/docs/native-client/nacl-and-pnacl/, with the most important difference there being that wasm isn't tied to chrome in the same way they were.
Shrug. Good luck...
yeah, it's not clear whether this is actually possible. hence the experiment :-)
Eh, somebody reinvents this every few years. For the kernel guys it was bpf. Pascal p-code was 1973. When I independently reinvented the concept in college (shortly before encountering Java) I called it "platform independent binary executable code".
Heck, dynamic translation of native machine code has been a thing for decades: pentium's basically a risc processor with a hardware x86 translation frontend, Transmeta's Crusoe chip did the same thing in software (the "code morphing" layer), and QEMU dynamically translates everything to everything ala http://landley.net/qemu/2008-01-29.html#Feb_1,2008-_TCG
I'm just bemused that android started with Java, then allowed native code (presumably for performance), and of course when people started shipping native code they kept shipping the oldest one you'd support even when that's not most efficient. This proposal would add another bytecode layer to regenerate the native code from (so the bytecode would be what goes stale over time, not the native code), and that bytecode developing version skew over time isn't a concern because... it hasn't yet? I'm not sure if you're solving a problem or just moving it. (Would you be removing support for the old binaries, or adding a new mechanism alongside and then supporting more codepaths?)
As I said: good luck.
of course when people started shipping native code they kept shipping the oldest one you'd support even when that's not most efficient. This proposal would add another bytecode layer to regenerate the native code from (so the bytecode would be what goes stale over time, not the native code), and that bytecode developing version skew over time isn't a concern because... it hasn't yet?
I don't think these are the same problem. The WASM -> native compiler can maintain support for old WASM IR versions. Apps ship the lowest-common-denominator binaries because it's quite difficult to do otherwise. You either have to exclude a segment of your users by targeting newer SoCs, or build dozens (hundreds?) of variants of the libraries and upload all of them. Both options are worse for most developers and their users than just shipping something compatible with the lowest-common-denominator. If they upload WASM, Play can fix that problem.
In this future will the WASM -> native compiler maybe drop support for ancient IR versions used by apps that haven't been updated in ages? Possibly, but that's still no worse than the status quo, and the apps that haven't been abandoned can benefit from this.
Hello, as a Wasm founder I am very excited to see this being considered as a future direction for Android. Generally, WebAssembly has matured and expanded from its web roots and non-web use cases are of particular interest both in the CG and in the wider community. Android being a very wide-reaching platform could be an excellent opportunity for Wasm to bring its inherent advantages (solid specification, toolchains from many languages, and high-performance runtimes) to another high-impact environment.
@DanAlbert if you have time I'd be happy to chat offline to hear your ideas of how this might work.
Any ideas how one could get an estimate of the performance difference between NDK and WASM Android apps?
Ideas, sure, run your benchmarks.
You're very, very early though. We're only recently getting close to having enough of the initial steps done to be able to run any interesting benchmarks. You don't need to worry about this for quite some time. We have a lot of work to go on basic functionality and our own performance analysis before it's worth anyone else worrying about it.
an estimate of the performance difference between NDK and WASM
There seems to be existing analysis suggesting Wasm execution time to be 1.4-2.5x native, but I'd expect that to be worse for NDK code which is likely to be atypically high in explicit SIMD or other ISA-specific content (e.g. Neon).
I noted Wasm doesn't yet support scalable vector types present in Armv9 and RISC-V, and RISC-V continues to consider different matrix extensions. This brings me to my main question: are there any other IR contenders here?
There's a similar discussion in the smart contracts space, though I suspect the Wasm compilation time concern goes away with Play-based builds.
@sharvey-img As context, note that the "Not So Fast" paper you link to measures wasm execution speed in two browser VMs, and it finds the main causes of slowdown are related to the tradeoffs in those specific runtimes:
"an offline compiler like Clang can spend considerably more time to generate better code, whereas WebAssembly compilers [in browsers] must be fast enough to run online"
Wasm can be much faster if, instead, it is compiled by something as powerful as clang, which is possible for Android. Measuring that, wasm has only 14% overhead.
The second major cause of overhead they found in "Not So Fast" was due to sandboxing. That overhead is also not relevant here because wasm's sandboxing is not the goal (and can be removed; though I hope part of it will be efficient enough that it might make sense to keep). So we can probably expect even less than 14% overhead here.
But, good point about SIMD, Wasm's support for it is definitely limited because of portability atm, and there will be benchmarks that suffer because of it. It remains an open question how much that can be improved or worked around.
That performance analysis (and any other you'll find that wasn't done by us) is probably comparing sandboxed wasm performance to native. We expect that to be far slower, and that's not what we're doing. See non-goal 1.
I was researching WASM a bit, and cannot see the reason why WASM was chosen as a target, I would even argue that Dalvik would be a better target than WASM as every android phone already has a pretty good optimizer for AOT dalvik to ELF compilation (dex2oat)
I don't see anything useful for IR language in WebAssembly instruction reference, and the same could be achieved with subset of Dalvik instructions.
Why would you want to write new WASM to machine code compiler, instead of contributing/using dex2oat?