wabt icon indicating copy to clipboard operation
wabt copied to clipboard

wasm2c: Optionally support #embed

Open SoniEx2 opened this issue 2 years ago • 7 comments

Instead of merely emitting data segments as array initializers, it would be neat if we could (optionally) use #embed too.

Too bad offset(...) isn't standard so we'll need to emit separate files for each data segment.

SoniEx2 avatar Nov 11 '23 12:11 SoniEx2

Wow, #embed looks awesome. First time I've seen it.

Using it for data segments seems possible, but it would also mean that wasm2c no longer generated just one single C file but a collection of files. Maybe as an option? What do you think would be the advantage of this option over doing the embedding like we do today?

sbc100 avatar Nov 11 '23 17:11 sbc100

(Doesn't #embed also emit array initializers?)

sbc100 avatar Nov 11 '23 17:11 sbc100

We note that wasm2c already outputs more than one file: a .h and a .c. We do think it should be an option, because C23 is, well we don't think it's even published yet? So yeah it's not exactly widely supported - yet.

We believe ThePhD's blog post has relevant benchmarks: https://thephd.dev/implementing-embed-c-and-c++

We've never personally hit data segments bigger than 48KiB when playing with wasm, but we're almost certain real-world use-cases do. ThePhD's benchmark used a 4MB file, which doesn't seem unreasonable to us: after all, wasm2c's is primarily used to take C/C++, compile it to wasm, and then compile it to C again, as part of RLBox; you can have a C program using #embed plus additional static initializers, compile it to wasm, and get fairly sizeable data segments that way.

SoniEx2 avatar Nov 11 '23 18:11 SoniEx2

Oh I see, do #embed doesn't just generate array initializers like we currently do? It can use compiler specific builtins to go faster under the hood? Do the advantage would be compile time improvements for large data segments?

sbc100 avatar Nov 11 '23 18:11 sbc100

Oh I see, do #embed doesn't just generate array initializers like we currently do? It can use compiler specific builtins to go faster under the hood? Do the advantage would be compile time improvements for large data segments?

Yes, it's basically designed to allow it to be implemented as "just shove the !@#$%^&* bytes into the !@#$%^&* executable already!" instead of anything like using array initializers. Raw byte concatenation, effectively. "The compiler uses fwrite and then creates an appropriate symbol for the spot which was written-to."

workingjubilee avatar Nov 11 '23 20:11 workingjubilee

So does that mean its not really a pure pre-processor feature? i.e. if you run the pre-processor it doesn't produce the array init expressions that I was imagining? Or is it that if you run a compiler like clang that does both pre-processing and compilation its expected to take a shortcut and avoid the array init? I guess the latter.

sbc100 avatar Nov 12 '23 05:11 sbc100

It is, strictly speaking, doing that. If you dump out the preprocessed file, it will contain what you are expecting. If you compile it like a typical C programmer, however, yes, very few C compilers have zero things that exploit the fact that they are both preprocessor and compiler.

workingjubilee avatar Nov 12 '23 07:11 workingjubilee