wabt icon indicating copy to clipboard operation
wabt copied to clipboard

options skip_function_bodies can not work with ReadBinaryIr

Open primstar-cool opened this issue 10 months ago • 18 comments

wabt::ReadBinaryOptions options;
    options.skip_function_bodies = true;

wabt::Result errorCode = wabt::ReadBinaryIr("", wasmData, fileSize, options, &errors, &module);

will got an assert at function [Result BinaryReaderIR::EndFunctionBody(Index index)] of file [binary-reader-ir.cc]

the reason is: PushLabel at BinaryReaderIR::BeginFunctionBody, but not consume it when 'skip_function_bodies' turns on.

I think it should add a Step names SkipExpr or trans options on BeginFunctionBody.

BTW, i got another problem, when parse a big wasm in Android, it will crash randomly when parse function code, but same code work well in MacOS. when i skip_function_bodies, it works well. i will provide detail later.

primstar-cool avatar Jan 25 '25 12:01 primstar-cool

the bug of run in Android, I'v got the error "Can't populate more pages for size class 224", seems allocate mem error for too many string

primstar-cool avatar Jan 25 '25 14:01 primstar-cool

i think the reason is allocating too mush times. a wasm of a unity demo game, will new 14,767,201 times.

here is my solution:

extends BinaryReaderIR, replace [std::make_unique] refer [AppendExpr], to [spec_make_unique] then override new operate in [spec_make_unique] alloc all unique_ptr in a big buffer;

before ~Module clear expr in module.funcs.exprs call expr.clear will call delete, so we should use extract_back() and release, until exprs empty

then it will run well in Android.

i hopes there is an official version with override memery alloc

primstar-cool avatar Jan 25 '25 16:01 primstar-cool

with using mem pool(release big buffer at end, and each opCode save some bytes and time for avoiding manage free memory), but it cost more than 1.6g mem, it still runs poorly on low-performance phones.

I will make a branch for saving memory.

primstar-cool avatar Jan 25 '25 18:01 primstar-cool

i'v make a pr, solve the error of ReadBinaryIr when options.skip_function_bodies=true and make a virtual op code, can output original byte code when skip_function_bodies=true.

https://github.com/WebAssembly/wabt/pull/2535

primstar-cool avatar Jan 26 '25 15:01 primstar-cool

do you need the IR? why not use binary reader directly?

SoniEx2 avatar Jan 26 '25 18:01 SoniEx2

A typical wasm size of an Unity game, is about 50-70M with 15,000,000+ Func exprs. in my case, i will load wasm and modify it in phone(android / ios). parse an 57M wasm will cost more than 1.5G mem. App will crash in Android on mem alloc ("Can't populate more pages for size class 224").

first step, i override operate new, for func expr will not delete at decode wasm, so they can be allocate solid in a big buffer. at the end, i will call destructor of each exprs and free the big buffer manually before Module destructor.

but, memory is still insufficient on low-end iOS devices. so i impl the options skip_function_bodies. it can hold raw bytecode. and then, we can extract it one by one.

primstar-cool avatar Jan 27 '25 14:01 primstar-cool

yeah, that happens when using the IR. why not use ReadBinary instead of ReadBinaryIr? that one gives you a stream of instructions you can modify on the fly.

SoniEx2 avatar Jan 27 '25 15:01 SoniEx2

A typical wasm size of an Unity game, is about 50-70M with 15,000,000+ Func exprs. in my case, i will load wasm and modify it in phone(android / ios).

Can you explain what you are trying to do here? For example, it sounds like you are trying to run a unity game on an phone using the wabt interpreter? Is that right?

sbc100 avatar Jan 27 '25 17:01 sbc100

load local wasm , and mod or inject sth in byte code. when we inject a new import function, each func-call index will be changed, so make a diff-file is a little big. so decode wasm, then merge patch in phone is effective.

libbinaryen can process the task in low mem, but libbinaryen is a bit slowly. libwabt decode 250% faster than libbinaryen. (libbinaryen 5 second vs libwabt 2s)

what i did: use a struct to save raw byte code when skip_function_bodies turns on. if we need patch functions, extract the struct, modify, and pack the struct one by one; whatever extracted or not, the new struct can output byte code correctly.

primstar-cool avatar Jan 27 '25 18:01 primstar-cool

that's what we figured you were doing, and we would still recommend using ReadBinary instead. it's a streaming parser so you only need enough RAM for roughly two copies of the wasm (the original and the modded), plus you still get decoded bytecode instead of raw bytecode.

SoniEx2 avatar Jan 27 '25 18:01 SoniEx2

does it means that, i impl a new BinaryReaderDelegate? but if i save the decoded Expr, it will cost same mem as BinaryReaderIr (57M wasm file cost 1.6G mem). the code in BinaryReaderIr just save it into a vector, I can't do it better. if i defined a small struct to save decoded data, it seems a too complex proj. i need modify the wasm after whole wasm decoded, then find dist function, decoded function body, modify , and re-pack it to binary code to save memory.

so the point is, i can't decode function bodies at first step, but needs byte code of funcs for second-pass decode. skip_function_bodies is an read options, when turn it on, BinaryReaderIr will assert on reading, and module will not output again. i think, if we turn the option skip_function_bodies on, it will at least output the same file.

here is my demo Pseudocode

void addImportFunc() {

  module.AppendField(std::make_unique<wabt::ImportModuleField>(
      std::move(newImport), wabt::Location()));
  // update elem id...
  // update export id...
  // update func binding id...
  // update call Expr as follows:
  class CallIdModifier : public wabt::ExprVisitor::DelegateNop {
  public:
    int importStart;
    wabt::Module *module;

    wabt::Result OnCallExpr(wabt::CallExpr *call_expr) override {
      // return wabt::Result::Ok;
      if (call_expr->var.is_index() && call_expr->var.index() >= importStart) {
        call_expr->var.set_index(call_expr->var.index() + 1);
      }
      return wabt::Result::Ok;
    }

    wabt::Result
    OnOpcodeRawExpr(wabt::OpcodeRawExpr *opcode_raw_expr) override {
        wabt::ReadBinaryOptions read_options;
        wabt::ExtractOpcodeRawExpr(*opcode_raw_expr, *module, read_options);
        return wabt::Result::Ok;
    }
  };

  for (auto func : module.funcs) {
    ev.VisitExprList(func->exprs);
   // repackaging opcode here...
  }
}

in pc , we turned skip_function_bodies off, and in phone we skip_function_bodies skip_function_bodies on, use ExtractOpcodeRawExpr when visit or by manual, and PackOpcodeRawExpr after modify. extracted or unextracted Expr will output correct byte code as follows

...
case ExprType::OpCodeRaw: {
  auto* opcode_raw_expr = cast<OpcodeRawExpr>(expr);
  if (opcode_raw_expr->is_extracted) {
    WriteExprList(func, opcode_raw_expr->extracted_exprs);
  } else {
    auto& opcode_buffer = opcode_raw_expr->opcode_buffer;
    // Opcode::End 0x0b
    assert(opcode_buffer[opcode_buffer.size() - 1] == 0x0b);
    stream_->WriteData(opcode_buffer.data(), opcode_buffer.size() - 1);
  }
}

primstar-cool avatar Jan 27 '25 20:01 primstar-cool

the webassembly binary format is mostly designed for single-pass processing, this is why imports come before anything else.

so the first thing that would be decoded would be the number of imports. you'd then immediately start emitting a new webassembly binary, with the extra imports added, and then go on to process the elem id, export id, etc. no need to save the decoded forms.

SoniEx2 avatar Jan 27 '25 20:01 SoniEx2

i think the read option skip_function_bodies is designed for quick analysis other blocks, but i think it should retain the ability to output again. I just enhanced the compatibility of this option, even though it won't be used in most cases. the new commit provided two static functions for extracting and packing raw data.

when we add a new import function, the func index must insert at the import block, this means that, every call cmd which id >= num_func_imports should add an offset (+1). the data in [function table] which func index >= num_func_imports, should add an offset also. same as exports_segment; update elem_segment and exports_segment is just 10 lines in total, it's not important. visit all func exprs is not difficult too, I spent half a day debugging on the PC and got it working. but in mobile device, it always crash for low memory.

steaming decode and encode can save memory, but it seems a big project, does writeModule accept an incomplete module? when On****Expr, encode and put it in an output stream, and release the Expr?

so the first commit is just save original bytes when option skip_function_bodies turns on. then i can modify it outside. in some case i just need to patch 1~2 funcs, so i need not decode all function bodies. the new feature is wrapped by a new macro WABT_KEEP_OPCODE_WHEN_SKIP_FUNC_BODY. extract and pack need use private func ReadInstructions and WriteExprList. so in new commit, i made 2 static funcs for user.

Modifying WASM on low-memory devices may be a rare requirement, but fortunately, the amount of modification needed is not significant.

primstar-cool avatar Jan 27 '25 21:01 primstar-cool

oh wait, we don't really have a streaming module writer. hmm...

(is it needed? how hard would it be to write out a module as it's being read? maybe this is something that could be improved.)

SoniEx2 avatar Jan 27 '25 22:01 SoniEx2

in my case, modify wasm in stream is a bit hard for I must add all import function before i scaned the function section, and i need to save lots of context info for modifying.

I wonder what dose the option skip_function_bodies usually used for ? Maybe using virtual ExprType is an urgly way to implement it, but i think retaining the original function_bodies is necessary virtual opcode OpCodeRaw can implment [loop], [visit], [write], [log], [valid] with only a little code. virtual ExprTypeis compatible to [join] [insert] to other exprsList. and when the expr extracted, it is similar to a BlockExpr without a Label. when we do a coverage testing, we just insert a callExpr at the beginning of this function, I don't even need to unpack this struct, it will be connected with other instructions and outputted as-is.

I also think it's a bit ugly for add an virtual ExprType, maybe a spec struct is a better choice. but when I noticed that there is a lot of code that switches on (eptr->type()), creating a dedicated struct might lead to a large number of modifications. so I chose the current solution.

btw, i handle wabt_expr_make_unique with a solid mem pool, it significantly improve the decoding performance of expr. in my test, decode time reduce from 1.51s to 0.88s, and reduce some mem usage(avoid create headers of memory block headers nor memory alignment, no memory fragmentation). i think libwabt can make a official version of expr mem alloc.

primstar-cool avatar Jan 28 '25 00:01 primstar-cool

we believe skip_function_bodies was never designed for use with ReadBinaryIr. it should probably have rejected it. (in fact, it seems to only be used by objdump.)

SoniEx2 avatar Jan 28 '25 00:01 SoniEx2

Yes, skip_function_bodies was designed for by used by objdump se you can do things like objdump -h to see the all the file headers in O(1) time without decoding the instruction stream.

sbc100 avatar Jan 28 '25 00:01 sbc100

I didn't know what this option was for at first, so i wrapped the feature in a macro, although a simple memcpy will cost only little time in DelegateNop. it can work with ReadBinaryIr when turned the macro on. when this macro is disabled, it has no side effects. if someone else need modify wasm in low-mem device, the feature is useful for that (maybe some people also have this requirement (low-memory environments), after finding it unworkable, they turned to libbinaryen, although it is larger and slower. So it may not be a spurious need. In fact, we use libbinaryen at the beginning, but it parses 2.5 times slower than wabt, so we switched to this library. If there will be an official low-peak-memory mode in the future, we would be very willing to switch to official mode (eg. streaming decode and encode).

primstar-cool avatar Jan 28 '25 08:01 primstar-cool