component-model icon indicating copy to clipboard operation
component-model copied to clipboard

Enhancing the Canonical ABI with Finer-Grained, Performant Type Layouts for IoT and Performance-Sensitive Environments

Open snowyu opened this issue 4 weeks ago • 6 comments

I acknowledge and appreciate the significant progress made by the Component Model (CM) and its Canonical ABI in resolving cross-language interoperability challenges within the WebAssembly ecosystem. The current design, which mandates a single, abstract (ptr, len) UTF-8 layout for aggregate types like strings and lists, effectively simplifies toolchain automation.

However, this abstraction comes at a cost of performance and memory efficiency, particularly within resource-constrained environments like IoT devices or high-performance computing (HPC) scenarios. I believe that if Wasm’s goal is to become the universal platform spanning browsers, servers, and embedded devices, the specification needs to better balance abstraction with execution efficiency.

My core initiative is to propose:

In the Component Model specification, I suggest introducing type variants and layout directives that would allow developers to select optimized memory layouts based on specific use cases, while still adhering to standardized contracts, thereby building sophisticated functionalities from composable, efficient, lower-level interfaces.

Current Challenges

  1. Memory Copying Overhead: The current specification necessitates memory copying and UTF-8 validation when aggregate data crosses component boundaries. This overhead is often prohibitive for memory-constrained IoT devices or latency-sensitive applications.
  2. "One-Size-Fits-All" ABI: The lack of optimized type options (e.g., inline storage for short strings) prevents developers from making necessary performance trade-offs.
  3. Implementation Complexity Shift: The design transfers performance burdens to runtimes and adapters, increasing their complexity and potentially causing toolchain delays, such as those observed in the C++ ecosystem's implementation of Preview 2 interfaces.

Proposed Solution: A More Flexible Type System

I propose extending the Interface Types (WIT) and the Canonical ABI to include the following mechanisms:

1. Standardized Data Layout Variants

I argue against enforcing a single (ptr, len) layout. Instead, I suggest providing a standardized mechanism to allow developers to choose the most efficient layout via metadata in their interface definitions:

  • canonical-string (Default): The current (ptr, len) UTF-8 layout (for general interoperability).
  • inline-byte-string: A Pascal-style (u8 length, content) layout (ideal for short strings, avoiding heap allocations).
  • inline-word-string: A (u16 length, content) layout (for medium-length strings, leveraging word alignment).
  • null-terminated: A C-style null-terminated layout (for efficient interoperability with existing C/C++ libraries).

2. Official Cross-Language Pack/Unpack SDKs

To maintain consistency and prevent ecosystem fragmentation, I urge the specification group or the Bytecode Alliance to provide rigorously tested, official SDKs for generating bindings that comply with these layout variants.

3. Adherence to the Principle of Composition

This approach aligns with the philosophy of composing robust functionality from simple, well-defined primitives. By standardizing these more granular types, I believe developers gain the flexibility to optimize for specific hardware requirements while remaining securely within the Wasm framework.

Expected Benefits

  • True Cross-Platform Reach: Enables viable performance across all target environments, from cloud servers to edge/IoT devices.
  • Performance Optimization: Eliminates unnecessary memory allocations and copies for use cases where inlining is more efficient.
  • Healthier Ecosystem: Provides necessary controls for developers to better balance the benefits of abstraction with real-world performance requirements.

I believe that incorporating this flexibility into the Component Model specification will be a crucial step in realizing WebAssembly's full potential as the universal computing platform.

snowyu avatar Dec 11 '25 01:12 snowyu

Thanks for the consideration and filing an issue. Separating out the "lifting" vs. "lowering" directions of the Canonical ABI:

On the lifting side:

  • If you have a structure that allocates chars , you can always pass a pointer to that inline storage, thereby covering the "inline" use cases you listed
  • Because we currently support not just string-encoding=utf8 but also utf16 and latin1+utf16, I think that also covers the byte and word use cases
  • null-terminated strings can be supported in guest code by doing a strlen() first; whether we need to optimize this case further probably requires some more real-world data

On the lowering side, the same string-encoding arguments apply, but the eager calls to realloc do make it much harder/slower to do inline storage. However, the lazy value lowering change (which we're considering adding in the short-term; right after P3 ships) should make inline storage allocation quite feasible. So perhaps let's let lazy-lowering happen and then reevaluate to see if there's a need for anything else.

lukewagner avatar Dec 11 '25 19:12 lukewagner

Thank you for the prompt response and for considering my feedback.

I believe our perspectives might not be entirely aligned on the desired outcome. My core proposal is not just about adding string encoding variations, but about standardizing a richer set of generic, low-level list/array ABI primitives that utilize different length prefixes (e.g., u8, u16 length fields). These primitives should be applicable to any byte array data, not strictly limited to text encoding.

My primary goal is to support the highly performance-sensitive scenario of efficiently passing immutable data as function arguments in embedded/IoT contexts, fundamentally avoiding unnecessary heap allocations and memory copies mandated by the current single ABI layout.

To address the points raised in the response:

  1. Regarding (ptr, len) covering inline storage:
    Yes, an external caller (Host) can represent inline data as a (ptr, len) pair when calling into a Wasm module. However, my concern is primarily with the lowering direction (Wasm module returning data to the Host) and the overall overhead of the current model. I am advocating for the ability to specify, in the interface definition, a compact layout that is guaranteed to be inline, requires no pointer dereferencing, and avoids a separate heap allocation (cabi_realloc).
  2. Regarding utf16/latin1 covering variants:
    These are encoding conventions, not memory layouts that specify length-prefix size. I am requesting a standardized layout variant, such as a binary array prefixed with a u8 length ([u8]: len-u8, content). This should be usable for arbitrary binary data, independent of specific text encodings.
  3. Regarding Lazy Value Lowering:
    The "Lazy Value Lowering" proposal sounds promising for addressing performance bottlenecks on the lowering side. However, a critical question remains: Will Lazy Lowering guarantee the complete avoidance of heap allocations (realloc) in memory-constrained environments like small IoT devices?

My proposal aims to provide a set of standardized, low-level building blocks that are closer to hardware requirements. By combining these highly efficient layouts, developers can build higher-level components that truly meet the diverse performance needs of the entire spectrum of computing platforms, from servers to constrained IoT devices.

snowyu avatar Dec 12 '25 09:12 snowyu

Will Lazy Lowering guarantee the complete avoidance of heap allocations (realloc) in memory-constrained environments like small IoT devices?

Yep! It gives full control to the module, allowing it to choose when and where to lower list/string elements.

lukewagner avatar Dec 12 '25 15:12 lukewagner

Thank you for confirming that Lazy Lowering allows for the avoidance of realloc. This clearly indicates the working group’s commitment to addressing performance concerns.

However, I believe there is still a critical missing link between the high-level Component Model abstraction and the low-level Canonical ABI implementation: a standardized "data layout control" layer.

Currently, achieving high performance seems to rely on developers manually implementing complex logic within specific runtimes (e.g., Wasmtime), as you noted by allowing modules "full control to choose when and where to lower list/string elements." This approach presents several issues:

  1. Implementation Fragmentation: High-performance implementations risk becoming highly specific to certain runtimes or language bindings, rather than being a standardized, cross-platform solution.
  2. Lack of Interoperability Guarantees: If I optimize my module for a runtime that supports Lazy Lowering effectively, I have no guarantee that another Wasm runtime (e.g., a browser engine or a different IoT runtime) will achieve the same efficient layout or behavior.

I propose we fill this gap by moving control over "when and where to lower elements" from ambiguous "runtime control" to standardized "Interface Types (WIT) definitions."

snowyu avatar Dec 14 '25 06:12 snowyu

The Component Model would specify lazy lowering to the same precision as the rest of WIT and Canonical ABI so there should be none of the ambiguity or runtime-divergent behaviors that you're concerned about. In particular, we would update #378 to include Preview 3 + Lazy Lowering and this would precisely spell out the core function imports that a core module would call to lower values into linear memory addresses of the core module's choosing, and that's exactly "when and where" the elements get lowered. So let's evaluate how things look when this is built and see if any concrete concerns remain.

lukewagner avatar Dec 15 '25 16:12 lukewagner

Bridging the Gap: Wasm as a Local Library (DLL) vs. a Remote Service

I realize that our "disconnect" stems from a fundamental difference in how we view WebAssembly's role.

The current Component Model design seems to treat Wasm modules as isolated microservices that communicate via a complex runtime protocol (like a local RPC). While this works for the cloud, it completely overlooks the Native Interop use case—where Wasm is used as a local shared library (like a DLL or .so).

  1. The Overhead of "Negotiation":
    In a native host environment (e.g., C++ calling a Wasm library), the overhead of dynamic "core function imports" to negotiate memory locations is unnecessary. When I want to share a high-performance library across languages (e.g., from AssemblyScript to Go), I need a Static ABI contract, not a dynamic handshake.
  2. Accessibility for Lightweight Languages:
    By refusing to standardize static memory layouts for high-level types, the specification is effectively excluding languages like AssemblyScript or Zig from full participation. These languages aim for minimal overhead. Forcing them to implement a complex state machine for "Lazy Lowering" destroys their advantage and makes "True Binary Sharing" impossible without heavy custom SDKs.
  3. Local Calling doesn't need "Remote" Logic:
    Security is handled by the Wasm sandbox boundaries. We don't need to bloat the ABI layer with complex negotiation logic that belongs in a network protocol, not a function call.

If the Component Model is to be a truly Universal Binary Interface, it must support a "Static Profile". This would allow a library written in any language to simply say: "My string layout is [u32 len][content], and I will place it here." This is the only way to achieve the simple, efficient library-sharing that the community actually needs for native, local use cases.

snowyu avatar Dec 16 '25 12:12 snowyu

To further my point, I believe a truly robust architecture should follow the principle of Single Responsibility and KISS. The current Component Model is attempting to bake "Security," "Transport," and "ABI" into a single, inseparable layer, which leads to the "spaghetti code" complexity we see today.

I propose a Layered Approach that respects the diverse needs of Wasm:

  1. The Static ABI Layer (The Foundation):
    This layer should define simple, predictable memory layouts (e.g., static string/array structures). For Native Interop and Firmware Logic, this is all that is needed. It provides the "Fast Path" for local calls with zero runtime negotiation overhead.
  2. The Transport/Marshalling Layer (Optional):
    If the data needs to cross different architectures (handling endianness or 32/64-bit differences) or requires streaming, this layer can be added. It handles the pack/unpack logic as a separate concern.
  3. The Security/Protocol Layer (Optional):
    Complex handshaking, ownership management, and sandbox-to-sandbox "Lazy Lowering" should reside here. This is essential for Cloud Plugins and untrusted code, but it shouldn't be a mandatory tax on all Wasm users.

By forcing everyone into the most complex "Security Layer" by default, we are making Wasm inefficient for the very devices (IoT/MCU) where it has the most potential. We need to decouple the "Memory Layout Contract" from the "Runtime Security Protocol." Only then can a library written in AssemblyScript be easily shared with Go or C without an army of custom SDKs.

snowyu avatar Dec 17 '25 09:12 snowyu