wabt wasm2c: Optionally support #embed

Instead of merely emitting data segments as array initializers, it would be neat if we could (optionally) use #embed too.

Too bad offset(...) isn't standard so we'll need to emit separate files for each data segment.

Nov 11 '23 12:11 SoniEx2

Wow, #embed looks awesome. First time I've seen it.

Using it for data segments seems possible, but it would also mean that wasm2c no longer generated just one single C file but a collection of files. Maybe as an option? What do you think would be the advantage of this option over doing the embedding like we do today?

Nov 11 '23 17:11 sbc100

(Doesn't #embed also emit array initializers?)

Nov 11 '23 17:11 sbc100

We note that wasm2c already outputs more than one file: a .h and a .c. We do think it should be an option, because C23 is, well we don't think it's even published yet? So yeah it's not exactly widely supported - yet.

We believe ThePhD's blog post has relevant benchmarks: https://thephd.dev/implementing-embed-c-and-c++

We've never personally hit data segments bigger than 48KiB when playing with wasm, but we're almost certain real-world use-cases do. ThePhD's benchmark used a 4MB file, which doesn't seem unreasonable to us: after all, wasm2c's is primarily used to take C/C++, compile it to wasm, and then compile it to C again, as part of RLBox; you can have a C program using #embed plus additional static initializers, compile it to wasm, and get fairly sizeable data segments that way.

Nov 11 '23 18:11 SoniEx2

Oh I see, do #embed doesn't just generate array initializers like we currently do? It can use compiler specific builtins to go faster under the hood? Do the advantage would be compile time improvements for large data segments?

Nov 11 '23 18:11 sbc100

Oh I see, do #embed doesn't just generate array initializers like we currently do? It can use compiler specific builtins to go faster under the hood? Do the advantage would be compile time improvements for large data segments?

Yes, it's basically designed to allow it to be implemented as "just shove the !@#$%^&* bytes into the !@#$%^&* executable already!" instead of anything like using array initializers. Raw byte concatenation, effectively. "The compiler uses fwrite and then creates an appropriate symbol for the spot which was written-to."

Nov 11 '23 20:11 workingjubilee

So does that mean its not really a pure pre-processor feature? i.e. if you run the pre-processor it doesn't produce the array init expressions that I was imagining? Or is it that if you run a compiler like clang that does both pre-processing and compilation its expected to take a shortcut and avoid the array init? I guess the latter.

Nov 12 '23 05:11 sbc100

It is, strictly speaking, doing that. If you dump out the preprocessed file, it will contain what you are expecting. If you compile it like a typical C programmer, however, yes, very few C compilers have zero things that exploit the fact that they are both preprocessor and compiler.

Nov 12 '23 07:11 workingjubilee

wabt wabt copied to clipboard

wasm2c: Optionally support #embed

wabt
wabt copied to clipboard