wasm-micro-runtime
wasm-micro-runtime copied to clipboard
Allow not copying the wasm binary into the module when using WASM C API
The wasm_module_new function copies the WASM binary passed as an argument into the module.
This PR allows passing an additional flag to wasm_module_new_ex, to avoid that copy. In that way, the binary can be manually released after module instantiation (instead of having to wait for the store to be destroyed).
not very sure about it. the binary content(in wasm_module_t) will be used during execution. It's on purpose to release the binary when destroying store.
not very sure about it. the binary content(in wasm_module_t) will be used during execution. It's on purpose to release the binary when destroying store.
I think we can have a try, it really increases the memory usage if wasm-c-api clones another copy, and if it doesn't do that, the content of input buffer from developer may be modified and the buffer will be referred by runtime after loading, the difference is that developer can not use the input buffer to new a module again.
not very sure about it. the binary content(in wasm_module_t) will be used during execution. It's on purpose to release the binary when destroying store.
Is that the case? When opening this PR, I had it tested only in interpreter mode and I thought it wouldn't work with AOT (that's why I created it as a draft and didn't bother fixing the CI errors). But now I tried it in AOT mode and it seems to work as well. Is there anything I'm missing?
be manually released after module instantiation
if this means wasm_byte_vec_delete(&binary) after wasm_instance_new(), I suggest repeat the AOT example with memset(binary.data, 0, ...) instead of wasm_byte_vec_delete.
IIUC, the big concern is function bytecodes. Classic-interp always needs the binary content. fast-interp doesn't because of recompilation. xxx_jit needs the content only during bytecodes to IRs translation. aot is there ⬆️. XIP always depends on the binary content. Then, next problem will be const strings. Luck for us, some function, like wasm_const_str_list_insert(), aot_const_str_set_insert(), have a flag to control copying. All we need is to go through.
Luck for us, some function, like
wasm_const_str_list_insert(),aot_const_str_set_insert(), have a flag to control copying. All we need is to go through.
iirc, it isn't by luck. it's for app-manager, which performs stream-loading, which is basically same as what this PR seems trying. i don't know how complete it is these days though. (eg. does it cover fast-jit?)
if this means wasm_byte_vec_delete(&binary) after wasm_instance_new(), I suggest repeat the AOT example with memset(binary.data, 0, ...) instead of wasm_byte_vec_delete.
Done that, it works fine. I wasn't precise there, you can't release right after module instantiation, you have to wait until wasm execution (e.g. wasm_func_call) because other APIs may use the binary buffer.
IIUC, the big concern is function bytecodes. Classic-interp always needs the binary content. fast-interp doesn't because of recompilation. xxx_jit needs the content only during bytecodes to IRs translation. aot is there ⬆️. XIP always depends on the binary content. Then, next problem will be const strings. Luck for us, some function, like wasm_const_str_list_insert(), aot_const_str_set_insert(), have a flag to control copying. All we need is to go through.
Just tried with classic interpreter (I had only tried fast interpreter and AOT successfully) and as you say this PR doesn't work there.
Then, next problem will be const strings. Luck for us, some function, like wasm_const_str_list_insert(), aot_const_str_set_insert(), have a flag to control copying. All we need is to go through.
What about those functions? Why do they need special treatment? During testing I didn't run into any problems with those.
So changes in this PR only work for fast interpreter and AOT (without XIP). Possibly JIT too, but I haven't tested it yet.
I think it makes sense to move forward with this PR and have this feature (to save memory) only for those modes that are compatible, what are your thoughts?
I initially raised https://github.com/bytecodealliance/wasm-micro-runtime/pull/3377, but, when I was using those new APIs with the Wasm C API, I had to replace wasm_module_new with wasm_runtime_read_to_sections, that doesn't use the store and did't seem a good fit for the Wasm C API.
which is basically same as what this PR seems trying.
This PR is not about stream-loading. It tries to save memory when using WAMR by allowing to release the wasm binary buffer before starting the wasm execution.
which is basically same as what this PR seems trying.
This PR is not about stream-loading. It tries to save memory when using WAMR by allowing to release the wasm binary buffer before starting the wasm execution.
the "make a copy instead of keeping references" aspect is basically same.
iirc, one of implications of is_load_from_file_buf=false is to make a copy. i guess you can somehow use it.
iirc, one of implications of is_load_from_file_buf=false is to make a copy. i guess you can somehow use it.
I see that a copy is done anyway regardless of that flag https://github.com/bytecodealliance/wasm-micro-runtime/blob/ca61184ced7d885395bc2425c87f0d063935dd1e/core/iwasm/interpreter/wasm_runtime.c#L4625
That flag seems to be used to shift the string https://github.com/bytecodealliance/wasm-micro-runtime/blob/ca61184ced7d885395bc2425c87f0d063935dd1e/core/iwasm/interpreter/wasm_runtime.c#L4596
But again, if we avoid freeing the module buffer until the wasm execution that shouldn't be needed.
iirc, one of implications of is_load_from_file_buf=false is to make a copy. i guess you can somehow use it.
I see that a copy is done anyway regardless of that flag
https://github.com/bytecodealliance/wasm-micro-runtime/blob/ca61184ced7d885395bc2425c87f0d063935dd1e/core/iwasm/interpreter/wasm_runtime.c#L4625
That flag seems to be used to shift the string
https://github.com/bytecodealliance/wasm-micro-runtime/blob/ca61184ced7d885395bc2425c87f0d063935dd1e/core/iwasm/interpreter/wasm_runtime.c#L4596
in the latter case, the user-given buffer is used.
But again, if we avoid freeing the module buffer until the wasm execution that shouldn't be needed.
i don't understand what you mean. can you explain a bit?
i don't understand what you mean. can you explain a bit?
I was wrong, I wasn't considering cases like this one https://github.com/bytecodealliance/wasm-micro-runtime/pull/3389#discussion_r1592493314
What is supposed to break with classic interpreter? Because I gave it a quick try and seemed to run fine.
Please correct me if I am wrong. Your examples will free/release/pollute binary content after module loading (before instantiation). And it works well in classic-interp, fast-interp, jit and aot.
my impression is that this requires users too much implementation knowledge.
how about providing an api to query if it's safe to free the buffer?
eg.
bool has_reference_to_underlying_byte_vec(const wasm_module_t *module);
my impression is that this requires users too much implementation knowledge.
how about providing an api to query if it's safe to free the buffer?
eg.
bool has_reference_to_underlying_byte_vec(const wasm_module_t *module);
It is good, but the API name is a not so consistent with others, how about bool wasm_module_refer_to_underlying_binary or others?
Please correct me if I am wrong. Your examples will free/release/pollute binary content after module loading (before instantiation). And it works well in classic-interp, fast-interp, jit and aot.
I tried classic interpreter, fast interpreter and AOT. Actually, I was confused by the terminology, in my tests I always free the wasm binary buffer after module instantiation, not after module creation. Internally we use the term loading for module creation + instantiation and that created the confusion for me.
how about providing an api to query if it's safe to free the buffer?
What would the API do? Just return is_binary_cloned from wasm_module_ex_t in case of wasm-c-api?
Or you mean we would need to add a new boolean to wasm_module_ex_t and set it to true/false after module loading (or instantiation) is completed, and that boolean is what's returned by the new API. That wouldn't simplify much for the user.
What about adding a new example (as I was doing in https://github.com/bytecodealliance/wasm-micro-runtime/pull/3377)? That would make things easier for the user, as they can just follow what is done there.
how about providing an api to query if it's safe to free the buffer?
What would the API do? Just return
is_binary_clonedfromwasm_module_ex_tin case of wasm-c-api? Or you mean we would need to add a new boolean towasm_module_ex_tand set it to true/false after module loading (or instantiation) is completed, and that boolean is what's returned by the new API. That wouldn't simplify much for the user.What about adding a new example (as I was doing in #3377)? That would make things easier for the user, as they can just follow what is done there.
@eloparco just my suggestion: maybe we can add two APIs, one is for WAMR api and is added in wasm_export.h, the other is for wasm-c-api and is added inc wasm_c_api.h, the former:
bool wasm_runtime_is_underlying_binary_freeable(const wasm_module_t module);
And we can just add a field like is_binary_freeable in WASMModule/AOTModule, set it in when wasm_runtime_load/wasm_runtime_load_ex, and return it in the new API.
the latter:
bool wasm_module_is_underlying_binary_cloned(wasm_module_t* module);
In the implementation, it is somewhat like: return ((wasm_module_ex_t *)module)->is_binary_cloned
@eloparco just my suggestion: maybe we can add two APIs, one is for WAMR api and is added in wasm_export.h, the other is for wasm-c-api and is added inc wasm_c_api.h
Added those (separate commit), not sure how useful they are. Why would the developer use wasm_module_is_underlying_binary_cloned? They already know if they passed clone_wasm_binary=true as LoadArgs.
Same for wasm_runtime_is_underlying_binary_freeable, it won't lift any implementation knowledge from the developer in my opinion. To do something like that, an API like
bool
wasm_module_free_wasm_binary_buffer(wasm_byte_vec_t *binary,
wasm_module_t *module)
{
if (!((wasm_module_ex_t *)module)->is_binary_cloned
&& !wasm_runtime_is_underlying_binary_freeable(module))
return false;
wasm_byte_vec_delete(binary);
return true;
}
would be more useful. And something equivalent in the WAMR API.
But I'm fine with the suggestion since it won't make a huge difference for me.
please add an sample, which works for all module types and modes.
Yes, I think that would be useful (I was suggesting it already in https://github.com/bytecodealliance/wasm-micro-runtime/pull/3389#issuecomment-2102225098)
@eloparco just my suggestion: maybe we can add two APIs, one is for WAMR api and is added in wasm_export.h, the other is for wasm-c-api and is added inc wasm_c_api.h
Added those (separate commit), not sure how useful they are. Why would the developer use
wasm_module_is_underlying_binary_cloned? They already know if they passedclone_wasm_binary=trueasLoadArgs.Same for
wasm_runtime_is_underlying_binary_freeable, it won't lift any implementation knowledge from the developer in my opinion. To do something like that, an API likebool wasm_module_free_wasm_binary_buffer(wasm_byte_vec_t *binary, wasm_module_t *module) { if (!((wasm_module_ex_t *)module)->is_binary_cloned && !wasm_runtime_is_underlying_binary_freeable(module)) return false; wasm_byte_vec_delete(binary); return true; }would be more useful. And something equivalent in the WAMR API.
But I'm fine with the suggestion since it won't make a huge difference for me.
when i suggested a query api, i meant to make it check all necessary conditions including is_binary_cloned, module type, etc. so it would be just:
if (wasm_runtime_is_underlying_binary_freeable(...)) {
wasm_byte_vec_delete(binary);
have_binary = false;
}
@eloparco just my suggestion: maybe we can add two APIs, one is for WAMR api and is added in wasm_export.h, the other is for wasm-c-api and is added inc wasm_c_api.h
Added those (separate commit), not sure how useful they are. Why would the developer use
wasm_module_is_underlying_binary_cloned? They already know if they passedclone_wasm_binary=trueasLoadArgs. Same forwasm_runtime_is_underlying_binary_freeable, it won't lift any implementation knowledge from the developer in my opinion. To do something like that, an API likebool wasm_module_free_wasm_binary_buffer(wasm_byte_vec_t *binary, wasm_module_t *module) { if (!((wasm_module_ex_t *)module)->is_binary_cloned && !wasm_runtime_is_underlying_binary_freeable(module)) return false; wasm_byte_vec_delete(binary); return true; }would be more useful. And something equivalent in the WAMR API. But I'm fine with the suggestion since it won't make a huge difference for me.
when i suggested a query api, i meant to make it check all necessary conditions including is_binary_cloned, module type, etc. so it would be just:
if (wasm_runtime_is_underlying_binary_freeable(...)) { wasm_byte_vec_delete(binary); have_binary = false; }
The wasm-c-api wasm_module_ex_t's is_binary_cloned and wasm/aot wasm_module_t's is_binary_freeable are two options and the behaviors are different, and they are stored in different structures, I think we cannot just provide one xxx_is_underly_binary_freeable like API for both of them, and also the input binary for wasm_module_new/wasm_module_new_ex/wasm_runtime_load/wasm_runtime_load_ex is provided by developer, it should not be freed by runtime or we had better not use a runtime API to free it. And the cloned binary of wasm-c-api (if the input binary is cloned) will be freed by runtime, developer should not free it also.
So, per my understanding, developer can use the xxx_is_underly_binary_freeable like API to check whether he can free the input binary and free it if yes, for wasm-c-api, his code is somewhat like:
if (wasm_module_is_underlying_binary_cloned(module)) { //or if (wasm_module_is_underlying_binary_freeable(module))
wasm_byte_vec_delete(binary);
wasm_runtime_free(binary);
}
for wasm/aot loader, his code is somewhat like:
if (wasm_runtime_is_underlying_binary_freeable(module)) {
wasm_runtime_free(buf);
}
@eloparco @yamt not sure whether that is ok for you? Or how about providing wasm_module_is_underlying_binary_freeable instead of wasm_module_is_underlying_binary_cloned for wasm-c-api, and wasm_runtime_is_underlying_binary_freeable for wasm/aot loader? Or is there other suggestion from you? Thanks.
Or how about providing wasm_module_is_underlying_binary_freeable instead of wasm_module_is_underlying_binary_cloned for wasm-c-api, and wasm_runtime_is_underlying_binary_freeable for wasm/aot loader?
Do we need two separate APIs? One (i.e. wasm_runtime_is_underlying_binary_freeable, that checks wasm_binary_freeable) should be enough since we set args->wasm_binary_freeable = !args->clone_wasm_binary;
And wasm_runtime_is_underlying_binary_freeable would look like what I already added to this PR.
including is_binary_cloned, module type, etc.
It's not module type but rather execution mode. I wouldn't add a check on that but rather write in the documentation that, with fast interpreter and AOT, the module can be freed after loading; while, in classic interpreter mode, after instantiation.
I wouldn't add a check on that
why not?
while, in classic interpreter mode, after instantiation.
i believe you can't free the buffer for classic interpreter.
Or how about providing wasm_module_is_underlying_binary_freeable instead of wasm_module_is_underlying_binary_cloned for wasm-c-api, and wasm_runtime_is_underlying_binary_freeable for wasm/aot loader?
Do we need two separate APIs? One (i.e.
wasm_runtime_is_underlying_binary_freeable, that checkswasm_binary_freeable) should be enough since we setargs->wasm_binary_freeable = !args->clone_wasm_binary;And
wasm_runtime_is_underlying_binary_freeablewould look like what I already added to this PR.
Yeah, I mean one API for wasm-c-api and one for wamr runtime api, and I found you had done that in the PR, it LGTM.
while, in classic interpreter mode, after instantiation.
i believe you can't free the buffer for classic interpreter.
Agree, how about we merge this PR first and then submit another PR to clone the wasm file's bytecode for classic interpreter mode to avoid referring to the underlying buffer?
while, in classic interpreter mode, after instantiation.
i believe you can't free the buffer for classic interpreter.
Agree, how about we merge this PR first and then submit another PR to clone the wasm file's bytecode for classic interpreter mode to avoid referring to the underlying buffer?
i'm not sure if it's a good idea. i suspect usually code and data segments occupy the major part of wasm module. if so, for classic interpreter, it might be simpler to do the opposite to what this PR does. that is, keep references to the underlying buffer for data segments as well.
i feel it's simpler to make wasm_runtime_is_underlying_binary_freeable return false for now, until someone evaluates pros/cons.
while, in classic interpreter mode, after instantiation.
i believe you can't free the buffer for classic interpreter.
Agree, how about we merge this PR first and then submit another PR to clone the wasm file's bytecode for classic interpreter mode to avoid referring to the underlying buffer?
i'm not sure if it's a good idea. i suspect usually code and data segments occupy the major part of wasm module. if so, for classic interpreter, it might be simpler to do the opposite to what this PR does. that is, keep references to the underlying buffer for data segments as well.
i feel it's simpler to make wasm_runtime_is_underlying_binary_freeable return false for now, until someone evaluates pros/cons.
OK, cloning bytecode for classic interpreter may save the memory of data segments, but as you said, we can just let wasm_runtime_is_underlying_binary_freeable return false now.
@yamt @wenyongh I applied the changes after the discussion, let me know if I missed anything