design icon indicating copy to clipboard operation
design copied to clipboard

String duplication in import section

Open sbc100 opened this issue 7 years ago • 8 comments
trafficstars

In the current binary format, if there are many imports from the same module, the module name will appear in the binary N times. In emscripten today this means the string "env" appears many times, but as we start to use DLLs more then these strings should be longer and more frequent.

Rather than specifying each imports as module + field perhaps we could group imports by the module. So we would have something like imports = (<module_name> <field_count> <field_name>*)*.

Or we could a string table but that seems like a much larger change.

sbc100 avatar Aug 07 '18 17:08 sbc100

I'm not sure how we'd do something like this without breaking the current format or adding a new import section. It may actually be easier to include a string table section, because we could use its presence to signify that all strings should be indexes into the table.

binji avatar Aug 07 '18 20:08 binji

Would it be possible to have a default module name (like env or host) that can be implicit (both in the text and binary formats)? So, for example, you could omit the "env" from (import "env" "foo" (...)).

This could be as-well-as a string table section.

7ombie avatar Apr 25 '21 06:04 7ombie

I guess you could achieve something like that by picking the empty string as your default?

sbc100 avatar Apr 25 '21 07:04 sbc100

@sbc100 - I was thinking that you could write (import "foo" (...)) and it would compile to the same thing as (import "env" "foo" (...)), so env would be the default module name for everyone. It's not much a saving though.

7ombie avatar Apr 25 '21 18:04 7ombie

the use of "env" is specific to emscripten, I don't think we would want to introduce that as a some magic value in the spec. Its already leaked a bit into llvm which is unfortunate.

sbc100 avatar Apr 25 '21 19:04 sbc100

@sbc100 - I see, but in the browser, many modules only import from the host, which is not really a module, so having a conventional module name for the host makes some sense, and then using it as a default would follow from that. It could just be named "host" or "$" or something, if "env" is problematic.

To be fair, this issue is about string duplication, and the string table suggestion would address that well, while having a conventional default module name for the host is not especially relevant to that, even if it could be implied by omission in the binary format.

7ombie avatar Apr 25 '21 21:04 7ombie