emscripten
emscripten copied to clipboard
WASM binary size with -sMAIN_MODULE 7 to 9 times heavier
Based on issue #23683.
Version of emscripten/emsdk:
emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 4.0.5 (53b38d0c6f9fce1b62c55a8012bc6477f7a42711)
clang version 21.0.0git (https:/github.com/llvm/llvm-project 553da9634dc4bae215e6c850d2de3186d09f9da5)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /root/emsdk/upstream/bin
I got a similar problem while including a heavy 46Mb file instead of using a char array. But I will use it to compare the two results
Former sample program from previous issue:
#include <stdio.h>
char test[1024*1024*50];
int main(void)
{
puts(test);
return 0;
}
Compiler command line and results:
emcc -o test.wasm -O3 -sMAIN_MODULE=0 -sTOTAL_MEMORY=200MB test.c
-rw-r--r-- 1 root root 171 Oct 3 06:18 test.c
-rwxr-xr-x 1 root root 2.0K Oct 3 06:18 test.wasm
emcc -o test.wasm -O3 -sMAIN_MODULE=1 -sTOTAL_MEMORY=200MB test.c
-rw-r--r-- 1 root root 171 Oct 3 06:18 test.c
-rwxr-xr-x 1 root root 1.6M Oct 3 06:19 test.wasm
emcc -o test.wasm -O3 -sMAIN_MODULE=2 -sTOTAL_MEMORY=200MB test.c
-rw-r--r-- 1 root root 171 Oct 3 06:18 test.c
-rwxr-xr-x 1 root root 8.3K Oct 3 06:20 test.wasm
The -sMAIN_MODULE=2 here will reduce the size significantly even if it is 4 times heavier than in the static linking. This is expected.
However, I tried something different for my issue :
New sample program:
#include <stdio.h>
#include "data_file.c"
int main(void)
{
puts((const char *)php_magic_database);
return 0;
}
The data_file.c is the PHP fileinfo extension libmagic database php_magic_database : https://github.com/php/php-src/blob/PHP-8.3.25/ext/fileinfo/data_file.c. It is 46Mb.
Compiler command line and results:
emcc -o test.wasm -O3 -sMAIN_MODULE=0 -sTOTAL_MEMORY=200MB test.c
-rw-r--r-- 1 root root 46M Oct 3 06:35 data_file.c
-rw-r--r-- 1 root root 168 Oct 3 06:37 test.c
-rwxr-xr-x 1 root root 1.2M Oct 3 06:37 test.wasm
emcc -o test.wasm -O3 -sMAIN_MODULE=1 -sTOTAL_MEMORY=200MB test.c
-rw-r--r-- 1 root root 46M Oct 3 06:35 data_file.c
-rw-r--r-- 1 root root 168 Oct 3 06:37 test.c
-rwxr-xr-x 1 root root 9.2M Oct 3 06:37 test.wasm
emcc -o test.wasm -O3 -sMAIN_MODULE=2 -sTOTAL_MEMORY=200MB test.c
-rw-r--r-- 1 root root 46M Oct 3 06:35 data_file.c
-rw-r--r-- 1 root root 168 Oct 3 06:37 test.c
-rwxr-xr-x 1 root root 7.6M Oct 3 06:41 test.wasm
While for the first experiment we had : 2Kb, then 1.6Mb, then 8kb, now we have from 1.2Mb, 9.2Mb and 7.6Mb. Is there a way to approach the 1.2Mb ? If not, is it possible to statically link that specific file because I know this file won't be used by other modules ?
I need to build my file with MAIN_MODULE since it will have other SIDE_MODULE during runtime load. But I also would like some heavy files to be significantly reduced like the data_file.c since it is only used in my MAIN_MODULE.
Do you know a way to achieve this ?
@sbc100 @kripken Is this a normal behavior and there is no way to shrink data the static way when setting MAIN_MODULE or should I investigate further ?
I'm currently working on a change that will make the main module no longer relocatable: https://github.com/emscripten-core/emscripten/pull/25522.
If the code size difference you are seeing is coming from the relocation functions then this will hopefully remove a lot of this overhead.
Can you see where the main differences between the 3 build above are coming from? Is it a data section or code section? If its the code section which function is it? In particular is the coming from the linker-generated relocation code?
@sbc100 The difference comes from the data section. In the MAIN_MODULE=0 version the data section is a list of shrinked sections :
(data $1 (i32.const 238056) "\fe\03\00 ") ( at most 250 KB long )
(data $2 (i32.const 238073) "@\00 ")
(data $3 (i32.const 238088) "\0c\00\00 ")
While in the MAIN_MODULE=2 version there is only one unshrinked section :
(data $0 (global.get $gimport$0) (26.3MB long )
So nothing related to functions. If I empty the php_magic_database variable from the data_file.c file, it will drastically decrease the size of the wasm file.
@sbc100 @kripken Is this the expected behavior? That data can’t be statically shrunk when MAIN_MODULE is set or should I look into it further?
I'd have to look into the specifics which I have not done yet, but in general I would hope that MAIN_MODULE=2 could do as good a job as MAIN_MODULE=0 and eliminating unused data. I'm guessing that the symbol pointing to the data is been kept alive for some reason, maybe its being exported?
You could try adding -Wl,--trace-symbol=php_magic_database to your link command which will tell you why / when that particular symbol is included by the linker.
@sbc100 I tried your pull request and it worked like a charm! The wasm test file is now expected to be 1.5Mb instead of 9.9Mb. So MAIN_MODULE=0 and MAIN_MODULE=2 eliminates perfectly unused data as it should be.
Wow, thats great news. That shows that https://github.com/emscripten-core/emscripten/pull/25522 is working as intended!