emscripten
emscripten copied to clipboard
Embed wasm by encoding binary data as UTF-8 code points.
Quick and dirty test to see what the CI says.
Possible alternative to #21426.
Utf8? I like the idea, it is just that doesn't utf8 takes 2 bytes for cp > 127? If I have really bad luck, the file size will be big?
Utf8? I like the idea, it is just that doesn't utf8 takes 2 bytes for cp > 127? If I have really bad luck, the file size will be big?
Since the contents of the wasm file will contain wasm opcodes, the byte distribution is going to be very determinstic. So I would expect a large Unity Boat Attack .wasm file to be as representative of a typical opcode distribution as any other Wasm file.
Also since the hypothesis was that gzip/brotli compress byte-aligned data better than non-byte aligned data, then presumably it won't matter as much that some of the input bytes expand out to two output bytes. (I think they could expand out to e.g. four output bytes as well, and gzip/brotli would likely pick up on such patterns)
Though like I mentioned in the other thread, if the file size on disk before the web server gz/br compression is important, then the expansion to two bytes will show up there.
Looks like other tests pass, but CircleCI gets a failure in core2:
Call parameter type does not match function signature!
%a.var = alloca ptr addrspace(10), align 1, addrspace(1)
ptr call void @llvm.lifetime.start.p0(i64 1, ptr addrspace(1) %a.var) #9
Call parameter type does not match function signature!
%a.var = alloca ptr addrspace(10), align 1, addrspace(1)
ptr call void @llvm.lifetime.end.p0(i64 1, ptr addrspace(1) %a.var) #9
in function __original_main
fatal error: error in backend: Broken function found, compilation aborted!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0. Program arguments: /root/emsdk/upstream/bin/clang -target wasm32-unknown-emscripten -fignore-exceptions -fPIC -fvisibility=default -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --sysroot=/root/cache/sysroot -DEMSCRIPTEN -Werror=implicit-function-declaration -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -Werror -fsanitize=address -Wno-unused-command-line-argument -mreference-types /root/project/test/core/test_externref_emjs.c -c -o /tmp/emtest_w5r1ff8r/emscripten_temp_tftngyhr/test_externref_emjs_0.o
1. <eof> parser at end of file
2. Code generation
3. Running pass 'Function Pass Manager' on module '/root/project/test/core/test_externref_emjs.c'.
4. Running pass 'Module Verifier' on function '@__original_main'
#0 0x00007fa15de63d38 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1dc0d38)
#1 0x00007fa15de6189e llvm::sys::RunSignalHandlers() (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1dbe89e)
#2 0x00007fa15de630bf llvm::sys::CleanupOnSignal(unsigned long) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1dc00bf)
#3 0x00007fa15dda0c07 (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) CrashRecoveryContext.cpp:0:0
#4 0x00007fa15dda0b9f llvm::CrashRecoveryContext::HandleExit(int) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1cfdb9f)
#5 0x00007fa15de5e5e7 llvm::sys::Process::Exit(int, bool) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1dbb5e7)
#6 0x0000555b23422473 (/root/emsdk/upstream/bin/clang+0x15473)
#7 0x00007fa15ddb2efc llvm::report_fatal_error(llvm::Twine const&, bool) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1d0fefc)
#8 0x00007fa15ddb2de6 (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1d0fde6)
#9 0x00007fa15e0bd210 void llvm::VerifierSupport::WriteTs<llvm::Instruction*, llvm::MDNode const*>(llvm::Instruction* const&, llvm::MDNode const* const&) Verifier.cpp:0:0
#10 0x00007fa15e0123b6 llvm::FPPassManager::runOnFunction(llvm::Function&) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1f6f3b6)
#11 0x00007fa15e01a992 llvm::FPPassManager::runOnModule(llvm::Module&) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1f77992)
#12 0x00007fa15e012f29 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1f6ff29)
#13 0x00007fa1639b40b5 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::__2::unique_ptr<llvm::raw_pwrite_stream, std::__2::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x2e270b5)
#14 0x00007fa163de3356 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x3256356)
#15 0x00007fa1625717d9 clang::ParseAST(clang::Sema&, bool, bool) (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x19e47d9)
#16 0x00007fa1645ddc0f clang::FrontendAction::Execute() (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x3a50c0f)
#17 0x00007fa164553dad clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x39c6dad)
#18 0x00007fa1646684af clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x3adb4af)
#19 0x0000555b234211d4 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/root/emsdk/upstream/bin/clang+0x141d4)
#20 0x0000555b2341ea08 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#21 0x00007fa1641f2b39 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::__2::optional<llvm::StringRef>>, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>>*, bool*) const::$_0>(long) Job.cpp:0:0
#22 0x00007fa15dda0b36 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1cfdb36)
#23 0x00007fa1641f2392 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::__2::optional<llvm::StringRef>>, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>>*, bool*) const (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x3665392)
#24 0x00007fa1641b4671 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x3627671)
#25 0x00007fa1641b4c5e clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::__2::pair<int, clang::driver::Command const*>>&, bool) const (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x3627c5e)
#26 0x00007fa1641d330d clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::__2::pair<int, clang::driver::Command const*>>&) (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x364630d)
#27 0x0000555b2341de43 clang_main(int, char**, llvm::ToolContext const&) (/root/emsdk/upstream/bin/clang+0x10e43)
#28 0x0000555b2342c737 main (/root/emsdk/upstream/bin/clang+0x1f737)
#29 0x00007fa15b71db97 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b97)
#30 0x0000555b2341ab2a _start (/root/emsdk/upstream/bin/clang+0xdb2a)
clang: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 19.0.0git (https:/github.com/llvm/llvm-project 33e312137b065ba330b187f56ddd60df70927241)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /root/emsdk/upstream/bin
clang: note: diagnostic msg:
********************
PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: /tmp/test_externref_emjs-5d510e.c
clang: note: diagnostic msg: /tmp/test_externref_emjs-5d510e.sh
clang: note: diagnostic msg:
********************
emcc: [31merror: [0m'/root/emsdk/upstream/bin/clang -target wasm32-unknown-emscripten -fignore-exceptions -fPIC -fvisibility=default -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --sysroot=/root/cache/sysroot -DEMSCRIPTEN -Werror=implicit-function-declaration -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -Werror -fsanitize=address -Wno-unused-command-line-argument -mreference-types /root/project/test/core/test_externref_emjs.c -c -o /tmp/emtest_w5r1ff8r/emscripten_temp_tftngyhr/test_externref_emjs_0.o' failed (returned 1)
None
None
test_externref_emjs_dynlink (test_core.asan) ... FAIL
======================================================================
FAIL [0.001s]: test_externref_emjs_dynlink (test_core.asan)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/lib/python3.6/unittest/case.py", line 59, in testPartExecutor
yield
File "/usr/lib/python3.6/unittest/case.py", line 605, in run
testMethod()
File "/root/project/test/common.py", line 630, in resulting_test
return func(self, *args)
File "/root/project/test/common.py", line 264, in decorated
return func(self, *args, **kwargs)
File "/root/project/test/common.py", line 129, in decorated
func(self, *args, **kwargs)
File "/root/project/test/test_core.py", line 9570, in test_externref_emjs
self.do_core_test('test_externref_emjs.c')
File "/root/project/test/test_core.py", line 367, in do_core_test
self.do_run_in_out_file_test(Path('core', testname), **kwargs)
File "/root/project/test/common.py", line 1609, in do_run_in_out_file_test
output = self._build_and_run(srcfile, expected, **kwargs)
File "/root/project/test/common.py", line 1631, in _build_and_run
output_basename=output_basename)
File "/root/project/test/common.py", line 1093, in build
self.run_process(cmd, stderr=self.stderr_redirect if not DEBUG else None)
File "/root/project/test/common.py", line 1430, in run_process
self.fail(f'subprocess exited with non-zero return code({e.returncode}): `{shared.shlex_join(cmd)}`')
File "/usr/lib/python3.6/unittest/case.py", line 670, in fail
raise self.failureException(msg)
AssertionError: subprocess exited with non-zero return code(1): `/root/project/emcc /root/project/test/core/test_externref_emjs.c -o test_externref_emjs.js -sNO_DEFAULT_TO_CXX -sALLOW_MEMORY_GROWTH -sMAIN_MODULE=2 -Wclosure -Werror -Wno-limited-postlink-optimizations -fsanitize=address --profiling -Wno-unused-command-line-argument -mreference-types`
----------------------------------------------------------------------
I am unable to reproduce that failure locally on my Windows system.
At a first glance this failure looks like something that should not be caused by this PR(?), since this PR only affects later down the pipe, but I am not 100% sure yet.
In any case, since that is the only failure, this PR should now look good enough to have eyeballs to review as a first pass.
In particular everything is now validated to work. The notable thing here is that this scheme requires <meta charset='utf-8'> in the .html file, but that is something that our shells already have had for a long while, and I think all good behaving .html files also do (it would be extremely rare to want to do any other encoding than utf-8)
This PR still has the feature behind an extra setting. Would we want to remove that setting, and always have this on? Or maybe an extra setting might be good, so people have a way to opt out in case odd situations arise at first pass (such as https://github.com/sublimehq/sublime_text/issues/6320 or https://github.com/google/closure-compiler/issues/4159)
If you've found 2 bugs with tools/IDEs within the first day, then we should expect there to be more. It seems quite risky.
Yeah, having an opt-out setting for a while would be good from that perspective.
Is this working. Can I try it?
Is this working. Can I try it?
Yeah, looks like the CI now passes (there were some intermittent CI failures last week), so this should be good to land.
Updated this PR to latest.
Added new -sSINGLE_FILE code size test in binary encoded mode. The improvement delta from base64-encoded SINGLE_FILE to this binary-encoded SINGLE_FILE in that code size test is
size of a.html == 17586, expected 18839, delta=-1253 (-6.65%)
size of a.html.gz == 10152, expected 10973, delta=-821 (-7.48%)
Total output size=17586 bytes, expected total size=18839, delta=-1253 (-6.65%)
Total output size gzipped=10152 bytes, expected total size gzipped=10973, delta=-821 (-7.48%)
Hey amazing, overall generated code size was improved by 1253 bytes!
so the new test shows that the size improvement carries through post-gzip.
It's been a while since we discussed this. I still think this would be a good feature to land, be enabled by default, and allow an opt-out that users can have to revert to old base64 form.
CI looks good and green now. This is good for another round of review.
Closing old stale PR.