tool-conventions
tool-conventions copied to clipboard
which memory dynamic-linking libraries should use for host functions?
https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md says:
Regardless of the kind, all modules accessing WASI APIs also export a linear memory with the name memory. Data pointers in WASI API calls are relative to this memory's index space.
i guess it's the convention used by many of host functions, not only WASI.
however, wasi-libc's libc.so doesn't export "memory".
actually, wasm-ld rejects -shared -export-memory. https://github.com/llvm/llvm-project/blob/06a808c4f4014edfeb2517bddd0bcb63eb4a260b/lld/wasm/Driver.cpp#L658
am i missing something?
a. wasm-ld should be fixed.
b. application-abi.md doesn't apply to a module which is a shared library. the memory exported from the "main" module should always be used. i feel this a bit strange because the module calling wasi (eg. libc.so) doesn't know about the "main" module.
The memory for a given program must be defined in exactly one module, from which it is exported. All the other modules then import that memory.
Currently the way this works is the main module always defines and exports the memory and all other modules (i.e. all shared libraries import the memory). In this case libc.so should be importing its memory.
do you mean b. is correct?
well, i agree that the memory exported by the main module and the memory imported by libc.so are normally same. is it something a runtime linker should actively validate?
I suppose it could/should validate that there is exactly one export called memory. Two different modules in the same program both exporting the linear memory would be an error I believe.
A while back we tried to rename memory to __linear_memory to be more descriptive (and match __indirect_function_table with two leading underscores) but for now we are still using just memory for the linear memory.
https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md was written before WASI had any concept of dynamic linking, so easy program is just one module. I will likely need to be updated if/when we want to support dynamic linking.
even with dynamic-linking, the main module still should export memory and __indirect_table, right?
currently it seems impossible for pie exeuctable because wasm-ld rejects -pie --export-table.
https://github.com/llvm/llvm-project/blob/04e7eaf91f3e9b6830feb2e08bb5b0ddb14c3174/lld/wasm/Driver.cpp#L632
do you consider it a bug?
At yes, that does look odd. In emscripten for some reason we defined the table on the JS side and import it, but your are correct, the main module should really be exporting it.
I took another look at emscripten and even attempted to convert it to using main-module-exported memory: https://github.com/emscripten-core/emscripten/pull/22332.
However, don't think this can work today, at least as long as the main module is being built as -pie. This is because in -pie mode, like -shared, the memory and table offsets used for the static data and element segments are based on the imported __table_base and __memory_base globals. Since these are not known to the linker its not possible to great an initial table or memory that is big enough for the static data (since it would need to do __table_base + segments size and/or __memory_base + semgnet size to determine this value).
In theory it could work if we pass --initial-table=xx to the linker, but in practice we want the initial table to be just big enough to for elements in the main module, and we don't know how man table elements there are until after the linker has run.
The solution here I think is to avoid building the main module with PIC (https://github.com/emscripten-core/emscripten/issues/12682).. since then the linker knowns exactly how big to make the initial table and memory.
I do think we could remove the restriction in https://github.com/llvm/llvm-project/blob/04e7eaf91f3e9b6830feb2e08bb5b0ddb14c3174/lld/wasm/Driver.cpp#L632 but that still leaves the problem of what value to use for --initial-table. The current default of "use the size of the segments" doesn't work since it doesn't take into account __table_base.
I took another look at emscripten and even attempted to convert it to using main-module-exported memory: https://github.com/emscripten-core/emscripten/pull/22332.
However, don't think this can work today, at least as long as the main module is being built as -pie. This is because in -pie mode, like -shared, the memory and table offsets used for the static data and element segments are based on the imported __table_base and __memory_base globals. Since these are not known to the linker its not possible to great an initial table or memory that is big enough for the static data (since it would need to do __table_base + segments size and/or __memory_base + semgnet size to determine this value).
i'm not familiar enough with emscripten to grasp the diff quickly. from your description, i guess you are trying to make it stop importing, right?
what's wrong with making it both imported and exported? (as we did for wasi-threads) at least for memory, i implemented it that way in toywasm. i haven't noticed any problems with it so far.
The memory for a given program must be defined in exactly one module, from which it is exported. All the other modules then import that memory.
Currently the way this works is the main module always defines and exports the memory and all other modules (i.e. all shared libraries import the memory). In this case libc.so should be importing its memory.
isn't it more consistent to make shared libraries export the imported memory as well? i feel it's better to have less special cases for shared libraries.
what's wrong with making it both imported and exported? (as we did for wasi-threads)
Nothing stops you from doing this but it seems redundant. As the embedder, if I am providing the memory and table as imports on the wasm file, why would I then want them to be re-expoted? I also have them, right? Its just add unnecessary exports.
At least for emscripten, I can't see a reason to both import and export.
It seems like a reasonable solution here would be to say "A module must either import or export something called memory which will be the memory that wasi API pointers refer to".
For PIE/dynamic linking its up the embedder to create the memory before loading the module, and to set __memory_base and __table_base accordingly.
As the embedder, if I am providing the memory and table as imports on the wasm file, why would I then want them to be re-expoted?
"avoid being different from statically linked modules" is a good enough reason, IMO.
The memory for a given program must be defined in exactly one module, from which it is exported. All the other modules then import that memory. Currently the way this works is the main module always defines and exports the memory and all other modules (i.e. all shared libraries import the memory). In this case libc.so should be importing its memory.
isn't it more consistent to make shared libraries export the imported memory as well? i feel it's better to have less special cases for shared libraries.
eg. otherwise caller.get_export("memory") style host function implementations will be unhappy.
As the embedder, if I am providing the memory and table as imports on the wasm file, why would I then want them to be re-expoted?
"avoid being different from statically linked modules" is a good enough reason, IMO.
I'm not sure I agree. Adding a little complexity to the host seem better than ask all modules to redundantly export something to the host (if nothing else this add some unnecessary size to every single module).
@sbc100 i tried to document it in https://github.com/WebAssembly/tool-conventions/pull/234. please take a look if/when you have time. thank you.