design
design copied to clipboard
String duplication in import section
In the current binary format, if there are many imports from the same module, the module name will appear in the binary N times. In emscripten today this means the string "env" appears many times, but as we start to use DLLs more then these strings should be longer and more frequent.
Rather than specifying each imports as module + field perhaps we could group imports by the module. So we would have something like imports = (<module_name> <field_count> <field_name>*)*.
Or we could a string table but that seems like a much larger change.
I'm not sure how we'd do something like this without breaking the current format or adding a new import section. It may actually be easier to include a string table section, because we could use its presence to signify that all strings should be indexes into the table.
Would it be possible to have a default module name (like env or host) that can be implicit (both in the text and binary formats)? So, for example, you could omit the "env" from (import "env" "foo" (...)).
This could be as-well-as a string table section.
I guess you could achieve something like that by picking the empty string as your default?
@sbc100 - I was thinking that you could write (import "foo" (...)) and it would compile to the same thing as (import "env" "foo" (...)), so env would be the default module name for everyone. It's not much a saving though.
the use of "env" is specific to emscripten, I don't think we would want to introduce that as a some magic value in the spec. Its already leaked a bit into llvm which is unfortunate.
@sbc100 - I see, but in the browser, many modules only import from the host, which is not really a module, so having a conventional module name for the host makes some sense, and then using it as a default would follow from that. It could just be named "host" or "$" or something, if "env" is problematic.
To be fair, this issue is about string duplication, and the string table suggestion would address that well, while having a conventional default module name for the host is not especially relevant to that, even if it could be implied by omission in the binary format.