emscripten icon indicating copy to clipboard operation
emscripten copied to clipboard

Embed wasm by encoding binary data as UTF-8 code points.

Open juj opened this issue 1 year ago • 9 comments

Quick and dirty test to see what the CI says.

Possible alternative to #21426.

juj avatar Mar 06 '24 09:03 juj

Utf8? I like the idea, it is just that doesn't utf8 takes 2 bytes for cp > 127? If I have really bad luck, the file size will be big?

msqr1 avatar Mar 06 '24 16:03 msqr1

Utf8? I like the idea, it is just that doesn't utf8 takes 2 bytes for cp > 127? If I have really bad luck, the file size will be big?

Since the contents of the wasm file will contain wasm opcodes, the byte distribution is going to be very determinstic. So I would expect a large Unity Boat Attack .wasm file to be as representative of a typical opcode distribution as any other Wasm file.

Also since the hypothesis was that gzip/brotli compress byte-aligned data better than non-byte aligned data, then presumably it won't matter as much that some of the input bytes expand out to two output bytes. (I think they could expand out to e.g. four output bytes as well, and gzip/brotli would likely pick up on such patterns)

Though like I mentioned in the other thread, if the file size on disk before the web server gz/br compression is important, then the expansion to two bytes will show up there.

juj avatar Mar 06 '24 19:03 juj

Looks like other tests pass, but CircleCI gets a failure in core2:

Call parameter type does not match function signature!
  %a.var = alloca ptr addrspace(10), align 1, addrspace(1)
 ptr  call void @llvm.lifetime.start.p0(i64 1, ptr addrspace(1) %a.var) #9
Call parameter type does not match function signature!
  %a.var = alloca ptr addrspace(10), align 1, addrspace(1)
 ptr  call void @llvm.lifetime.end.p0(i64 1, ptr addrspace(1) %a.var) #9
in function __original_main
fatal error: error in backend: Broken function found, compilation aborted!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /root/emsdk/upstream/bin/clang -target wasm32-unknown-emscripten -fignore-exceptions -fPIC -fvisibility=default -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --sysroot=/root/cache/sysroot -DEMSCRIPTEN -Werror=implicit-function-declaration -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -Werror -fsanitize=address -Wno-unused-command-line-argument -mreference-types /root/project/test/core/test_externref_emjs.c -c -o /tmp/emtest_w5r1ff8r/emscripten_temp_tftngyhr/test_externref_emjs_0.o
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module '/root/project/test/core/test_externref_emjs.c'.
4.	Running pass 'Module Verifier' on function '@__original_main'
 #0 0x00007fa15de63d38 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1dc0d38)
 #1 0x00007fa15de6189e llvm::sys::RunSignalHandlers() (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1dbe89e)
 #2 0x00007fa15de630bf llvm::sys::CleanupOnSignal(unsigned long) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1dc00bf)
 #3 0x00007fa15dda0c07 (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) CrashRecoveryContext.cpp:0:0
 #4 0x00007fa15dda0b9f llvm::CrashRecoveryContext::HandleExit(int) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1cfdb9f)
 #5 0x00007fa15de5e5e7 llvm::sys::Process::Exit(int, bool) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1dbb5e7)
 #6 0x0000555b23422473 (/root/emsdk/upstream/bin/clang+0x15473)
 #7 0x00007fa15ddb2efc llvm::report_fatal_error(llvm::Twine const&, bool) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1d0fefc)
 #8 0x00007fa15ddb2de6 (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1d0fde6)
 #9 0x00007fa15e0bd210 void llvm::VerifierSupport::WriteTs<llvm::Instruction*, llvm::MDNode const*>(llvm::Instruction* const&, llvm::MDNode const* const&) Verifier.cpp:0:0
#10 0x00007fa15e0123b6 llvm::FPPassManager::runOnFunction(llvm::Function&) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1f6f3b6)
#11 0x00007fa15e01a992 llvm::FPPassManager::runOnModule(llvm::Module&) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1f77992)
#12 0x00007fa15e012f29 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1f6ff29)
#13 0x00007fa1639b40b5 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::__2::unique_ptr<llvm::raw_pwrite_stream, std::__2::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x2e270b5)
#14 0x00007fa163de3356 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x3256356)
#15 0x00007fa1625717d9 clang::ParseAST(clang::Sema&, bool, bool) (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x19e47d9)
#16 0x00007fa1645ddc0f clang::FrontendAction::Execute() (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x3a50c0f)
#17 0x00007fa164553dad clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x39c6dad)
#18 0x00007fa1646684af clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x3adb4af)
#19 0x0000555b234211d4 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/root/emsdk/upstream/bin/clang+0x141d4)
#20 0x0000555b2341ea08 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#21 0x00007fa1641f2b39 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::__2::optional<llvm::StringRef>>, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>>*, bool*) const::$_0>(long) Job.cpp:0:0
#22 0x00007fa15dda0b36 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1cfdb36)
#23 0x00007fa1641f2392 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::__2::optional<llvm::StringRef>>, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>>*, bool*) const (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x3665392)
#24 0x00007fa1641b4671 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x3627671)
#25 0x00007fa1641b4c5e clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::__2::pair<int, clang::driver::Command const*>>&, bool) const (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x3627c5e)
#26 0x00007fa1641d330d clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::__2::pair<int, clang::driver::Command const*>>&) (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x364630d)
#27 0x0000555b2341de43 clang_main(int, char**, llvm::ToolContext const&) (/root/emsdk/upstream/bin/clang+0x10e43)
#28 0x0000555b2342c737 main (/root/emsdk/upstream/bin/clang+0x1f737)
#29 0x00007fa15b71db97 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b97)
#30 0x0000555b2341ab2a _start (/root/emsdk/upstream/bin/clang+0xdb2a)
clang: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 19.0.0git (https:/github.com/llvm/llvm-project 33e312137b065ba330b187f56ddd60df70927241)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /root/emsdk/upstream/bin
clang: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: /tmp/test_externref_emjs-5d510e.c
clang: note: diagnostic msg: /tmp/test_externref_emjs-5d510e.sh
clang: note: diagnostic msg: 

********************
emcc: [31merror: [0m'/root/emsdk/upstream/bin/clang -target wasm32-unknown-emscripten -fignore-exceptions -fPIC -fvisibility=default -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --sysroot=/root/cache/sysroot -DEMSCRIPTEN -Werror=implicit-function-declaration -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -Werror -fsanitize=address -Wno-unused-command-line-argument -mreference-types /root/project/test/core/test_externref_emjs.c -c -o /tmp/emtest_w5r1ff8r/emscripten_temp_tftngyhr/test_externref_emjs_0.o' failed (returned 1)
None
None
test_externref_emjs_dynlink (test_core.asan) ... FAIL

======================================================================
FAIL [0.001s]: test_externref_emjs_dynlink (test_core.asan)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.6/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/lib/python3.6/unittest/case.py", line 605, in run
    testMethod()
  File "/root/project/test/common.py", line 630, in resulting_test
    return func(self, *args)
  File "/root/project/test/common.py", line 264, in decorated
    return func(self, *args, **kwargs)
  File "/root/project/test/common.py", line 129, in decorated
    func(self, *args, **kwargs)
  File "/root/project/test/test_core.py", line 9570, in test_externref_emjs
    self.do_core_test('test_externref_emjs.c')
  File "/root/project/test/test_core.py", line 367, in do_core_test
    self.do_run_in_out_file_test(Path('core', testname), **kwargs)
  File "/root/project/test/common.py", line 1609, in do_run_in_out_file_test
    output = self._build_and_run(srcfile, expected, **kwargs)
  File "/root/project/test/common.py", line 1631, in _build_and_run
    output_basename=output_basename)
  File "/root/project/test/common.py", line 1093, in build
    self.run_process(cmd, stderr=self.stderr_redirect if not DEBUG else None)
  File "/root/project/test/common.py", line 1430, in run_process
    self.fail(f'subprocess exited with non-zero return code({e.returncode}): `{shared.shlex_join(cmd)}`')
  File "/usr/lib/python3.6/unittest/case.py", line 670, in fail
    raise self.failureException(msg)
AssertionError: subprocess exited with non-zero return code(1): `/root/project/emcc /root/project/test/core/test_externref_emjs.c -o test_externref_emjs.js -sNO_DEFAULT_TO_CXX -sALLOW_MEMORY_GROWTH -sMAIN_MODULE=2 -Wclosure -Werror -Wno-limited-postlink-optimizations -fsanitize=address --profiling -Wno-unused-command-line-argument -mreference-types`

----------------------------------------------------------------------

I am unable to reproduce that failure locally on my Windows system.

At a first glance this failure looks like something that should not be caused by this PR(?), since this PR only affects later down the pipe, but I am not 100% sure yet.

In any case, since that is the only failure, this PR should now look good enough to have eyeballs to review as a first pass.

In particular everything is now validated to work. The notable thing here is that this scheme requires <meta charset='utf-8'> in the .html file, but that is something that our shells already have had for a long while, and I think all good behaving .html files also do (it would be extremely rare to want to do any other encoding than utf-8)

This PR still has the feature behind an extra setting. Would we want to remove that setting, and always have this on? Or maybe an extra setting might be good, so people have a way to opt out in case odd situations arise at first pass (such as https://github.com/sublimehq/sublime_text/issues/6320 or https://github.com/google/closure-compiler/issues/4159)

juj avatar Mar 06 '24 20:03 juj

If you've found 2 bugs with tools/IDEs within the first day, then we should expect there to be more. It seems quite risky.

curiousdannii avatar Mar 06 '24 23:03 curiousdannii

Yeah, having an opt-out setting for a while would be good from that perspective.

juj avatar Mar 07 '24 14:03 juj

Is this working. Can I try it?

msqr1 avatar Mar 15 '24 15:03 msqr1

Is this working. Can I try it?

Yeah, looks like the CI now passes (there were some intermittent CI failures last week), so this should be good to land.

juj avatar Mar 21 '24 20:03 juj

Updated this PR to latest.

Added new -sSINGLE_FILE code size test in binary encoded mode. The improvement delta from base64-encoded SINGLE_FILE to this binary-encoded SINGLE_FILE in that code size test is

size of a.html == 17586, expected 18839, delta=-1253 (-6.65%)
size of a.html.gz == 10152, expected 10973, delta=-821 (-7.48%)
Total output size=17586 bytes, expected total size=18839, delta=-1253 (-6.65%)
Total output size gzipped=10152 bytes, expected total size gzipped=10973, delta=-821 (-7.48%)
Hey amazing, overall generated code size was improved by 1253 bytes!

so the new test shows that the size improvement carries through post-gzip.

It's been a while since we discussed this. I still think this would be a good feature to land, be enabled by default, and allow an opt-out that users can have to revert to old base64 form.

juj avatar Aug 27 '24 18:08 juj

CI looks good and green now. This is good for another round of review.

juj avatar Aug 27 '24 21:08 juj

Closing old stale PR.

juj avatar Aug 16 '25 06:08 juj