binaryen icon indicating copy to clipboard operation
binaryen copied to clipboard

Adding APIs for in-memory operation without fs access

Open brson opened this issue 1 year ago • 8 comments

I've received a request to make it possible for wasm-opt-rs to perform optimization entirely in memory without accessing the filesystem. This seems like a reasonable feature, and relatively easy to add. Is there any apatite for this in tree?

brson avatar May 19 '23 18:05 brson

The C API does support running the optimizer,

https://github.com/WebAssembly/binaryen/blob/e42a58696059fd1cadcf25e10223b979214984b3/src/binaryen-c.h#LL2974C19-L2974C41

And also arbitrary passes can be run,

https://github.com/WebAssembly/binaryen/blob/e42a58696059fd1cadcf25e10223b979214984b3/src/binaryen-c.h#L3075-L3077

The only other thing wasm-opt does is to provide a commandline API, that it translates into calls to the C++ APIs that the C API calls, basically. Is that not good enough? It might not be, if we're missing something, like I'm not sure if all the commandline flags have C APIs - maybe recent ones like --closed-world don't, and I see source maps mentioned in the issue you linked, which I'm not sure of either.

Perhaps it would be nice to have a C/C++ API that gets commandline flags and handles them, and wasm-opt would use that - that would keep things in sync. Is that what you're thinking of?

kripken avatar May 19 '23 18:05 kripken

(Actually, are you using the C API or C++ API?)

kripken avatar May 19 '23 18:05 kripken

@kripken I am using the C++ API.

My original issue I think was not worded correctly. It's not that I want to run the optimizations in memory, it's that I want to read and write the modules without touching the file system, so that I can run the optimizations.

brson avatar May 25 '23 20:05 brson

This issue may be moot for me for now since the requester of this feature also wants to run on the wasm32-unknown-unknown target and I suspect binaryen cannot be compiled to that target, but instead needs to compile to wasm32-unknown-emscripten.

brson avatar May 25 '23 20:05 brson

Hmm, what's missing in the C++ API then? You can convert bytes into a Module and then optimize the Module, and convert it back into bytes. Sorry, not sure I understand yet.

I don't know if anyone's tried to compile binaryen with wasm32-unknown-unknown, but it might just work, or it might need an ifdef or two I guess to avoid things like threads for now.

kripken avatar May 25 '23 22:05 kripken

Hmm, what's missing in the C++ API then? You can convert bytes into a Module and then optimize the Module, and convert it back into bytes. Sorry, not sure I understand yet.

It's definitely possible there are APIs I'm not finding. So far I have been using ModuleReader and ModuleWriter, and those deal in files. I don't see how to do what they are doing with in-memory input and output in an obvious way without copying the logic in these two types.

So to handle loading the modules I would need to do something like readTextData and readBinaryData:

static void readTextData(std::string& input, Module& wasm, IRProfile profile) {
  if (useNewWATParser) {
    std::string_view in(input.c_str());
    if (auto parsed = WATParser::parseModule(wasm, in);
        auto err = parsed.getErr()) {
      Fatal() << err->msg;
    }
  } else {
    SExpressionParser parser(const_cast<char*>(input.c_str()));
    Element& root = *parser.root;
    SExpressionWasmBuilder builder(wasm, *root[0], profile);
  }
}

void ModuleReader::readBinaryData(std::vector<char>& input,
                                  Module& wasm,
                                  std::string sourceMapFilename) {
  std::unique_ptr<std::ifstream> sourceMapStream;
  // Assume that the wasm has had its initial features applied, and use those
  // while parsing.
  WasmBinaryBuilder parser(wasm, wasm.features, input);
  parser.setDebugInfo(debugInfo);
  parser.setDWARF(DWARF);
  parser.setSkipFunctionBodies(skipFunctionBodies);
  if (sourceMapFilename.size()) {
    sourceMapStream = make_unique<std::ifstream>();
    sourceMapStream->open(sourceMapFilename);
    parser.setDebugLocations(sourceMapStream.get());
  }
  parser.read();
  if (sourceMapStream) {
    sourceMapStream->close();
  }
}

where readBinaryData would instead need to use an in-memory source map.

and to serialize the modules again do the same is in writeText and writeBinary:

void ModuleWriter::writeText(Module& wasm, Output& output) {
  output.getStream() << wasm;
}

void ModuleWriter::writeBinary(Module& wasm, Output& output) {
  BufferWithRandomAccess buffer;
  WasmBinaryWriter writer(&wasm, buffer);
  // if debug info is used, then we want to emit the names section
  writer.setNamesSection(debugInfo);
  if (emitModuleName) {
    writer.setEmitModuleName(true);
  }
  std::unique_ptr<std::ofstream> sourceMapStream;
  if (sourceMapFilename.size()) {
    sourceMapStream = make_unique<std::ofstream>();
    sourceMapStream->open(sourceMapFilename);
    writer.setSourceMap(sourceMapStream.get(), sourceMapUrl);
  }
  if (symbolMap.size() > 0) {
    writer.setSymbolMap(symbolMap);
  }
  writer.write();
  buffer.writeTo(output);
  if (sourceMapStream) {
    sourceMapStream->close();
  }
}

The text cases look obvious, but for the binary cases there is some important logic here that needs to be repeated if I want to do what ModuleReader and ModuleWriter are doing, particularly wrt debug info and source maps.

brson avatar May 26 '23 18:05 brson

It's not a lot of code obviously, so if that's the way to do it, I can definitely do it, but I'm happy to have any tips.

brson avatar May 26 '23 18:05 brson

Ah, yes, that looks right. So readBinary/writeBinary is almost what you want, but it assumes source maps are actual files. I think that would make sense to generalize, and doing it in-tree makes sense to me. That is, the low-level functions should work entirely on bytes in memory, and higher-level ones would handle loading source map data from disk etc. as needed.

kripken avatar May 26 '23 19:05 kripken