binaryen icon indicating copy to clipboard operation
binaryen copied to clipboard

Creating relocatable binaries via AST

Open jayphelps opened this issue 7 years ago • 17 comments

It doesn't appear that binaryen's AST has a way to include reloc sections. What's everyone doing today with binaryen? (not using LLVM) Didn't see any discussions around a desire to add such support, so curious if perhaps I'm confused.

I guess this probably works in some limited situations as a temporary solution: wasm2wat | wat2wasm --relocatable 😆

jayphelps avatar Nov 28 '17 04:11 jayphelps

To be clear, if it is something that makes sense in binaryen I would try to add it when I get the time. Not asking others to do the work for me 🕺

Edit: of course I haven't had the time lol

jayphelps avatar Nov 28 '17 04:11 jayphelps

This should be added eventually, yeah, would be great if you want to do it. The relocations stuff is just getting stable now in LLVM, lld, etc., so it made sense to wait for that. But now is a good time.

We probably don't want to reimplement full linking in binaryen, but it would probably be useful to support loading/saving relocations. What use case did you have in mind?

kripken avatar Nov 28 '17 18:11 kripken

@kripken Cool. use case: using binaryen as a backend to build wasm AST, emitting each unit to disk as relocatable binary, so we can then link them all together with any deps. Since this is pretty standard stuff I was worried that I'm missing something about binaryen since it feels like most who use binaryen's AST backend would end up wanting this at some point? Want to make sure I'm not misunderstanding something.

jayphelps avatar Nov 28 '17 18:11 jayphelps

For what it's worth I have some support for reading reloc and linking sections in a tool I'm working on to generate emscripten-readable metadata (currently called lld-metadata, which will change before it lands) on the o2wasm branch. If you want to use that in any way, it's here: https://github.com/WebAssembly/binaryen/blob/o2wasm/src/tools/lld-metadata.cpp

jgravelle-google avatar Nov 28 '17 21:11 jgravelle-google

@jayphelps I do think we want to support exactly the use case that you describe: allowing Binaryen to emit object files that are linkable with lld (and ideally, compatible with LLVM-generated binaries). That means writing relocations. I also agree with @kripken that we don't want to implement a linker; we should just use lld and I think it also should be able to handle the use cases that the current wasm-merge tool does. I also think we might want to support running wasm-to-wasm transforms and optimizations (i.e. the opt tool) on wasm object files; that would mean reading the relocations, and possibly making the optimizer aware of any conventions that would affect the ability to optimize.

dschuff avatar Nov 28 '17 21:11 dschuff

Hi, I'm wondering is there any progress on this direction?

The relocatable wasm binary is really helpful, if we want to build a compiler for advanced programming language, there must be a large amount of standard library or compiler-rt, it's common to write those libs using C/C++ and link with the user code to generate the final executable file.

If binaryen lacks the feature to emit relocatable object files, then seems all the common libs need to be written using binaryen IR, this seems a huge amount of work to do.

xujuntwt95329 avatar Aug 07 '22 08:08 xujuntwt95329

@xujuntwt95329 Can you not run binaryen after linking the user code with the standard library files? That is what e.g. Emscripten does today: it links all the object files and libraries with wasm-ld and then optimizes the result. That's simpler and also there are LTO-like optimization opportunities that way.

kripken avatar Aug 08 '22 16:08 kripken

@kripken The Emscripten uses LLVM as codegen to generate the object files, and binaryen worked as an optimizer. However, we want to use binaryen directly as the codegen because we want to use the GC feature (currently seams GC opcodes not available in LLVM)

xujuntwt95329 avatar Aug 08 '22 23:08 xujuntwt95329

@xujuntwt95329 Ah, yes, so the issue is that LLVM doesn't support GC opcodes, so you can't run wasm-ld to link files that use GC? That is a current limitation.

Adding relocatable (wasm object file) support in Binaryen is one option to help there. But Binaryen would need to also be able to link the way wasm-ld does. That's a huge amount of work. Instead, I think a better option is to add a simple linker in Binaryen, wasm-merge. We had such a tool and removed it, but we could restore it. It would link files without relocation - what it does is connect imports to exports and combine two files into one. That's much simpler than wasm-ld, but I think it's enough, like this:

  • Build C++ parts using LLVM. This emits object/relocatable files.
  • Link C++ parts using wasm-ld to get a wasm A. This is a final wasm file.
  • Build GC parts using Binaryen to get a wasm B. This is a final wasm file.
  • Run wasm-merge to link A and B together. This is also a final wasm file.
  • Run wasm-opt on that single final merged file (this will do inlining and other opts across C++ and GC code, DCE, etc.).

kripken avatar Aug 09 '22 16:08 kripken

Hi @kripken, thanks for your reply!

Ah, yes, so the issue is that LLVM doesn't support GC opcodes, so you can't run wasm-ld to link files that use GC? That is a current limitation.

Yes, this is the current limitation.

And for the solution you proposed, yes, it is an interesting approach, it should work for some scenarios, but the limitation is the lack of relocation.

  • If we treat all value as GC object, don't use the linear memory, then seems relocation is not required;
  • However, if we want to use GC objects and linear memory together, then relocation is required (for global variables which are statically put to linear memory, also for the data section)

Currently the WebAssembly Object File Linking has already becomes a factual standard, and LLVM's wasm-ld fully support it, so IMO, if binaryen can generate these sections, we can benefit from the powerful wasm-ld tool, and then we can build more powerful tools based on binaryen.

xujuntwt95329 avatar Aug 09 '22 17:08 xujuntwt95329

Do your GC files actually need to touch linear memory directly? I wonder if they could call an API instead - an imported function that does a load from linear memory. That would become fast after linking + inlining. But you'd need to add similar workarounds for globals, etc.

If you do want to really mix relocations and GC in arbitrary ways then, yes, the only real solution is to either add GC support in wasm-ld or relocation support in Binaryen:

  • I'm not sure how much GC support would be needed in wasm-ld - hopefully it would not need to actually read or understand the GC contents? cc @sbc100
  • Adding relocation support in Binaryen would take some work, but hopefully not much. We already use relocations for many things, like call has a name for the function and not an index, global.get has a name for the global, etc. What we are missing is stuff like relocations on immediates (like a load or store offset - we'd need to allow a name there and not just a constant) and on const nodes, etc.

kripken avatar Aug 09 '22 19:08 kripken

Do your GC files actually need to touch linear memory directly?

Well, it is not decided yet, we are evaluating the capabilities of binaryen and LLVM, the functionality provided by the tools will largely influence our choice of how to use GC and linear memory. So this is also why I‘m asking, relocation is not a must, but if we support it, then there may be less limitation and more possibilities to use binaryen as a compiler codegen.

At this moment, I think we should start from the solution you mentioned before, since the relocation support seems will not ready in short time, but we are really looking forward to this feature as well as the DWARF debug information support, which is also a very important feature for compilers.

xujuntwt95329 avatar Aug 10 '22 12:08 xujuntwt95329

By "solution you mentioned before" do you mean the one using wasm-merge?

If so, let's wait to hear from @sbc100 on the above question about wasm-ld. If wasm-ld can link GC then we don't need wasm-merge. If it can't, then we do need wasm-merge, and then the question would be if there is someone with time&interest to work on it (it's not a lot of work, and I could get to it eventually, but I'm not sure when).

kripken avatar Aug 10 '22 16:08 kripken

Maybe @ashleynh would be interested in working on wasm-merge due to its multi-memory component?

tlively avatar Aug 11 '22 00:08 tlively

wasm-ld has no support for GC today. The work of adding such support would mostly be the introduction of new symbols types and new relocation types I believe... but my understanding on the GC proposal is fairly limited.

sbc100 avatar Aug 11 '22 17:08 sbc100

I guess we would want to add support in the MC layer for writing wasm GC programs in assembly (if only so we can write wasm-ld tests), which I imagine is a good chunk of work too.

sbc100 avatar Aug 11 '22 17:08 sbc100

By "solution you mentioned before" do you mean the one using wasm-merge?

Yes, and before wasm-merge is ready, I think I can simply extract all the functions from one module through binaryen's existing API and insert them into the target module.

xujuntwt95329 avatar Aug 11 '22 17:08 xujuntwt95329