binaryen
binaryen copied to clipboard
Creating relocatable binaries via AST
It doesn't appear that binaryen's AST has a way to include reloc sections. What's everyone doing today with binaryen? (not using LLVM) Didn't see any discussions around a desire to add such support, so curious if perhaps I'm confused.
I guess this probably works in some limited situations as a temporary solution: wasm2wat | wat2wasm --relocatable
😆
To be clear, if it is something that makes sense in binaryen I would try to add it when I get the time. Not asking others to do the work for me 🕺
Edit: of course I haven't had the time lol
This should be added eventually, yeah, would be great if you want to do it. The relocations stuff is just getting stable now in LLVM, lld, etc., so it made sense to wait for that. But now is a good time.
We probably don't want to reimplement full linking in binaryen, but it would probably be useful to support loading/saving relocations. What use case did you have in mind?
@kripken Cool. use case: using binaryen as a backend to build wasm AST, emitting each unit to disk as relocatable binary, so we can then link them all together with any deps. Since this is pretty standard stuff I was worried that I'm missing something about binaryen since it feels like most who use binaryen's AST backend would end up wanting this at some point? Want to make sure I'm not misunderstanding something.
For what it's worth I have some support for reading reloc and linking sections in a tool I'm working on to generate emscripten-readable metadata (currently called lld-metadata, which will change before it lands) on the o2wasm branch. If you want to use that in any way, it's here: https://github.com/WebAssembly/binaryen/blob/o2wasm/src/tools/lld-metadata.cpp
@jayphelps I do think we want to support exactly the use case that you describe: allowing Binaryen to emit object files that are linkable with lld (and ideally, compatible with LLVM-generated binaries). That means writing relocations. I also agree with @kripken that we don't want to implement a linker; we should just use lld and I think it also should be able to handle the use cases that the current wasm-merge
tool does. I also think we might want to support running wasm-to-wasm transforms and optimizations (i.e. the opt
tool) on wasm object files; that would mean reading the relocations, and possibly making the optimizer aware of any conventions that would affect the ability to optimize.
Hi, I'm wondering is there any progress on this direction?
The relocatable wasm binary is really helpful, if we want to build a compiler for advanced programming language, there must be a large amount of standard library or compiler-rt, it's common to write those libs using C/C++ and link with the user code to generate the final executable file.
If binaryen lacks the feature to emit relocatable object files, then seems all the common libs need to be written using binaryen IR, this seems a huge amount of work to do.
@xujuntwt95329 Can you not run binaryen after linking the user code with the standard library files? That is what e.g. Emscripten does today: it links all the object files and libraries with wasm-ld
and then optimizes the result. That's simpler and also there are LTO-like optimization opportunities that way.
@kripken The Emscripten uses LLVM as codegen to generate the object files, and binaryen worked as an optimizer. However, we want to use binaryen directly as the codegen because we want to use the GC feature (currently seams GC opcodes not available in LLVM)
@xujuntwt95329 Ah, yes, so the issue is that LLVM doesn't support GC opcodes, so you can't run wasm-ld
to link files that use GC? That is a current limitation.
Adding relocatable (wasm object file) support in Binaryen is one option to help there. But Binaryen would need to also be able to link the way wasm-ld
does. That's a huge amount of work. Instead, I think a better option is to add a simple linker in Binaryen, wasm-merge
. We had such a tool and removed it, but we could restore it. It would link files without relocation - what it does is connect imports to exports and combine two files into one. That's much simpler than wasm-ld
, but I think it's enough, like this:
- Build C++ parts using LLVM. This emits object/relocatable files.
- Link C++ parts using
wasm-ld
to get a wasmA
. This is a final wasm file. - Build GC parts using Binaryen to get a wasm
B
. This is a final wasm file. - Run
wasm-merge
to linkA
andB
together. This is also a final wasm file. - Run
wasm-opt
on that single final merged file (this will do inlining and other opts across C++ and GC code, DCE, etc.).
Hi @kripken, thanks for your reply!
Ah, yes, so the issue is that LLVM doesn't support GC opcodes, so you can't run wasm-ld to link files that use GC? That is a current limitation.
Yes, this is the current limitation.
And for the solution you proposed, yes, it is an interesting approach, it should work for some scenarios, but the limitation is the lack of relocation.
- If we treat all value as GC object, don't use the linear memory, then seems relocation is not required;
- However, if we want to use GC objects and linear memory together, then relocation is required (for global variables which are statically put to linear memory, also for the data section)
Currently the WebAssembly Object File Linking has already becomes a factual standard, and LLVM's wasm-ld
fully support it, so IMO, if binaryen can generate these sections, we can benefit from the powerful wasm-ld
tool, and then we can build more powerful tools based on binaryen.
Do your GC files actually need to touch linear memory directly? I wonder if they could call an API instead - an imported function that does a load from linear memory. That would become fast after linking + inlining. But you'd need to add similar workarounds for globals, etc.
If you do want to really mix relocations and GC in arbitrary ways then, yes, the only real solution is to either add GC support in wasm-ld
or relocation support in Binaryen:
- I'm not sure how much GC support would be needed in
wasm-ld
- hopefully it would not need to actually read or understand the GC contents? cc @sbc100 - Adding relocation support in Binaryen would take some work, but hopefully not much. We already use relocations for many things, like
call
has a name for the function and not an index,global.get
has a name for the global, etc. What we are missing is stuff like relocations on immediates (like aload
orstore
offset - we'd need to allow a name there and not just a constant) and onconst
nodes, etc.
Do your GC files actually need to touch linear memory directly?
Well, it is not decided yet, we are evaluating the capabilities of binaryen
and LLVM
, the functionality provided by the tools will largely influence our choice of how to use GC and linear memory. So this is also why I‘m asking, relocation
is not a must
, but if we support it, then there may be less limitation and more possibilities to use binaryen as a compiler codegen.
At this moment, I think we should start from the solution you mentioned before, since the relocation support seems will not ready in short time, but we are really looking forward to this feature as well as the DWARF debug information support, which is also a very important feature for compilers.
By "solution you mentioned before" do you mean the one using wasm-merge
?
If so, let's wait to hear from @sbc100 on the above question about wasm-ld
. If wasm-ld
can link GC then we don't need wasm-merge
. If it can't, then we do need wasm-merge
, and then the question would be if there is someone with time&interest to work on it (it's not a lot of work, and I could get to it eventually, but I'm not sure when).
Maybe @ashleynh would be interested in working on wasm-merge due to its multi-memory component?
wasm-ld
has no support for GC today. The work of adding such support would mostly be the introduction of new symbols types and new relocation types I believe... but my understanding on the GC proposal is fairly limited.
I guess we would want to add support in the MC layer for writing wasm GC programs in assembly (if only so we can write wasm-ld tests), which I imagine is a good chunk of work too.
By "solution you mentioned before" do you mean the one using wasm-merge?
Yes, and before wasm-merge
is ready, I think I can simply extract all the functions from one module through binaryen's existing API and insert them into the target module.