binaryen
binaryen copied to clipboard
Adding APIs for in-memory operation without fs access
I've received a request to make it possible for wasm-opt-rs to perform optimization entirely in memory without accessing the filesystem. This seems like a reasonable feature, and relatively easy to add. Is there any apatite for this in tree?
The C API does support running the optimizer,
https://github.com/WebAssembly/binaryen/blob/e42a58696059fd1cadcf25e10223b979214984b3/src/binaryen-c.h#LL2974C19-L2974C41
And also arbitrary passes can be run,
https://github.com/WebAssembly/binaryen/blob/e42a58696059fd1cadcf25e10223b979214984b3/src/binaryen-c.h#L3075-L3077
The only other thing wasm-opt
does is to provide a commandline API, that it translates into calls to the C++ APIs that the C API calls, basically. Is that not good enough? It might not be, if we're missing something, like I'm not sure if all the commandline flags have C APIs - maybe recent ones like --closed-world
don't, and I see source maps mentioned in the issue you linked, which I'm not sure of either.
Perhaps it would be nice to have a C/C++ API that gets commandline flags and handles them, and wasm-opt
would use that - that would keep things in sync. Is that what you're thinking of?
(Actually, are you using the C API or C++ API?)
@kripken I am using the C++ API.
My original issue I think was not worded correctly. It's not that I want to run the optimizations in memory, it's that I want to read and write the modules without touching the file system, so that I can run the optimizations.
This issue may be moot for me for now since the requester of this feature also wants to run on the wasm32-unknown-unknown target and I suspect binaryen cannot be compiled to that target, but instead needs to compile to wasm32-unknown-emscripten.
Hmm, what's missing in the C++ API then? You can convert bytes into a Module and then optimize the Module, and convert it back into bytes. Sorry, not sure I understand yet.
I don't know if anyone's tried to compile binaryen with wasm32-unknown-unknown
, but it might just work, or it might need an ifdef or two I guess to avoid things like threads for now.
Hmm, what's missing in the C++ API then? You can convert bytes into a Module and then optimize the Module, and convert it back into bytes. Sorry, not sure I understand yet.
It's definitely possible there are APIs I'm not finding. So far I have been using ModuleReader and ModuleWriter, and those deal in files. I don't see how to do what they are doing with in-memory input and output in an obvious way without copying the logic in these two types.
So to handle loading the modules I would need to do something like readTextData
and readBinaryData
:
static void readTextData(std::string& input, Module& wasm, IRProfile profile) {
if (useNewWATParser) {
std::string_view in(input.c_str());
if (auto parsed = WATParser::parseModule(wasm, in);
auto err = parsed.getErr()) {
Fatal() << err->msg;
}
} else {
SExpressionParser parser(const_cast<char*>(input.c_str()));
Element& root = *parser.root;
SExpressionWasmBuilder builder(wasm, *root[0], profile);
}
}
void ModuleReader::readBinaryData(std::vector<char>& input,
Module& wasm,
std::string sourceMapFilename) {
std::unique_ptr<std::ifstream> sourceMapStream;
// Assume that the wasm has had its initial features applied, and use those
// while parsing.
WasmBinaryBuilder parser(wasm, wasm.features, input);
parser.setDebugInfo(debugInfo);
parser.setDWARF(DWARF);
parser.setSkipFunctionBodies(skipFunctionBodies);
if (sourceMapFilename.size()) {
sourceMapStream = make_unique<std::ifstream>();
sourceMapStream->open(sourceMapFilename);
parser.setDebugLocations(sourceMapStream.get());
}
parser.read();
if (sourceMapStream) {
sourceMapStream->close();
}
}
where readBinaryData
would instead need to use an in-memory source map.
and to serialize the modules again do the same is in writeText
and writeBinary
:
void ModuleWriter::writeText(Module& wasm, Output& output) {
output.getStream() << wasm;
}
void ModuleWriter::writeBinary(Module& wasm, Output& output) {
BufferWithRandomAccess buffer;
WasmBinaryWriter writer(&wasm, buffer);
// if debug info is used, then we want to emit the names section
writer.setNamesSection(debugInfo);
if (emitModuleName) {
writer.setEmitModuleName(true);
}
std::unique_ptr<std::ofstream> sourceMapStream;
if (sourceMapFilename.size()) {
sourceMapStream = make_unique<std::ofstream>();
sourceMapStream->open(sourceMapFilename);
writer.setSourceMap(sourceMapStream.get(), sourceMapUrl);
}
if (symbolMap.size() > 0) {
writer.setSymbolMap(symbolMap);
}
writer.write();
buffer.writeTo(output);
if (sourceMapStream) {
sourceMapStream->close();
}
}
The text cases look obvious, but for the binary cases there is some important logic here that needs to be repeated if I want to do what ModuleReader and ModuleWriter are doing, particularly wrt debug info and source maps.
It's not a lot of code obviously, so if that's the way to do it, I can definitely do it, but I'm happy to have any tips.
Ah, yes, that looks right. So readBinary/writeBinary is almost what you want, but it assumes source maps are actual files. I think that would make sense to generalize, and doing it in-tree makes sense to me. That is, the low-level functions should work entirely on bytes in memory, and higher-level ones would handle loading source map data from disk etc. as needed.