design
design copied to clipboard
Compact import section format
The binary format of the import section is a vector of elements, each containing a module name, an item name, and the kind and type of the imported item. This means that a module that imports one thousand items from some module named "env"
would repeat the string "env"
one thousand times in its import section. As the number of imports goes up, this repetition becomes extremely wasteful, even if the module name is extremely short or even empty. With JS string builtins, the number of imports a module might reasonably have is about to increase dramatically, so it would be nice to be able to avoid this duplication as much as possible.
It would be possible to design an alternative import section binary format that avoided this duplication, either by grouping multiple imports from a single module together, or by using indices to refer to strings in a table, or by some other mechanism.
Potentially we could go even farther. For example, we could provide a succinct encoding for a sequence of n imports from a module, all of the same type, and with increasing decimal indices as names (i.e. "0"
, "1"
, "2"
, etc.). This would allow modules to declare an arbitrary number of imports in just a handful of bytes, at the cost of the design being hyper-specialized to that one particular pattern of imports.
Would there be appetite for specifying a more compact alternate encoding of the import section? Obviously we would continue supporting the existing encoding, but we could allocate a new section ID or another unused bit to differentiate the compact and existing formats.
FWIW, I've also wanted to define a new strings
section that allowed common strings to be factored out and referred to by stringidx
anywhere an inline string literal can be used today. (We could do this uniformly and backwards-compatibly by (ab)using the existing binary encoding of string literals, which requires valid UTF-8, to allow a stringidx
to be encoded as an invalid UTF-8 byte sequence.) That being said, it sounds like you might want to do even fancier things than just factoring out common strings, so maybe that requires doing something import-section-specific.
anywhere an inline string literal can be used today
Off the top of my head, this would be the import section, the export section, custom section names, and possibly the contents of custom sections such as the name section. Are there other locations I'm missing?
I think this could make sense if it showed some good size decreases. With respect to the linked js-string-builtins issue, that discussion isn't finished yet and we may settle on something that doesn't require a huge amount of imports. So the hyper-specialization around array indices might not be necessary.
One related issue to the binary size of imports is also just the slowness of performing 'read the imports'. For the string constant use-case for imports, a lot of time is spent performing the specified two fully generic property lookups (one for modName
, the other for fieldName
).
We've discussed ways internally to optimize this by avoiding the repeated modName
lookup through some hashing, but we weren't sure if that's technically allowed due to property lookup being observable through proxies/etc. A compact import section might make that optimization more feasible.
Are there other locations I'm missing?
Yep! If we get back to adding module-linking to core wasm, then instance
definitions (which supply import arguments by name) would be another case.
Reading this comment, I wondered if perhaps we could allow string constants to initialize an elem
section of an externref
table, in which case that could be another use of string indices.
Generally speaking I like this idea, and it seems very straightforward. Back in ancient history, one reason we invested a little bit (but not too heavily) in module size was that we expected duplication of this type to be well-compressible; both by gz/brotli today and possibly even more in the future with some hypothetical improved wasm-specific compression scheme (but let's ignore that for now since it hasn't materialized and isn't on anyone's radar right now). When considering the bang-for-buck for this idea (and for that matter, things like the binary size implications of different ideas for string builtins), we should probably also make sure we're primarily considering compressed size rather than uncompressed size. Having said that of course, @eqrion raised a good point above, and there are still reasons that uncompressed size matters; e.g. speed and memory requirements of tools that process uncompressed files, and even memory used by JS engines if they need to keep any module bytes around to implement the JS module introspection APIs.
Would it be safe to say that compact import sections would allow string constants in the js-builtins proposal to use the same wasm:...
module string as all the other JS built-ins without hurting binary size, addressing the concerns folks had over
(import "'" "my string constant" (global externref))
being a bit too cute?
Would it be safe to say that compact import sections would allow string constants in the js-builtins proposal to use the same
wasm:...
module string as all the other JS built-ins without hurting binary size, addressing the concerns folks had over(import "'" "my string constant" (global externref))
being a bit too cute?
Yes, that's the main reason I think this would be interesting. I plan on presenting a proposal for this on 7-02 CG meeting.
In the meantime, before this exists (if it ever does exist) we'd still need an alternative for the js-string use-case, but that's being discussed there.
It addresses the concern for me.
(While I do think a wasm:
or js:
namespace would be "better" than '
aesthetically, I don't personally consider the issue to be a blocker for the js builtins proposal, and would never change a vote based on that.)
Regardless, I am in favor of developing a new, compact import section encoding either way.