wasm-micro-runtime icon indicating copy to clipboard operation
wasm-micro-runtime copied to clipboard

[WIP] feat: support graph encoding backend auto-detection in wasi-nn

Open TE-7000026184 opened this issue 1 month ago • 7 comments

TE-7000026184 avatar Nov 13 '25 09:11 TE-7000026184

In current implementation of this PR, we assume that loading will fail if the format is not matched. However, after checking some code, it seems the result of "load" function cannot indicate whether the file is in the correct format, for some back-ends, "load" functions only do some memory copy works.

Wondering whether there is an additional API for each back-end to check whether the model file is encoded in its way? Plan B is to look into the content of the model file to be loaded. For "tflite" and "h5" format, header of the file contains magic number to verify the format, but "onnx" and some other formats do not have such things.

TE-7000026184 avatar Nov 28 '25 09:11 TE-7000026184

How to check model file format:

openvino: ”load” function checks “builder“ number, should be two, one xml file and one binary weight file. Partial reliable.

onnx: ”load” function checks the status of ctx->ort_api->CreateSessionFromArray. Reliable.

tensorflow: Is it really supported? It should be a directory containing a pb or pbtxt file.

pytorch: Is it really supported? It should be a package.

tensorflowlite: “TFL3” at 5-8 byte: “00 1c 00 00 46 54 33 4c 00 14 00 20 00 1c”. Need to add a check.

ggml: “GGUF” at 1-4 byte: “47 47 46 55”. Need to add a check. But llama does not support “load“ function?

__attribute__((visibility("default"))) wasi_nn_error
load(void *ctx, graph_builder_array *builder, graph_encoding encoding,
     execution_target target, graph *g)
{
    return unsupported_operation;
}

TE-7000026184 avatar Dec 02 '25 08:12 TE-7000026184

i guess you should explain the motivation of auto-detection.

yamt avatar Dec 02 '25 09:12 yamt

i guess you should explain the motivation of auto-detection.

The motivation is to implement auto-detect option in encoding when loading a model file. In case a binary model file without an extension is provided, or in case different types of model files are used by application user.

TE-7000026184 avatar Dec 02 '25 10:12 TE-7000026184

i guess you should explain the motivation of auto-detection.

The motivation is to implement auto-detect option in encoding when loading a model file. In case a binary model file without an extension is provided, or in case different types of model files are used by application user.

i feel it's too much for a wasm runtime to implement file format detection. although it might be possible to make it work for many cases, "work for many cases" is not reliable enough, IMO. after all, the user should know the format for sure. what's the point to make a guess?

yamt avatar Dec 05 '25 08:12 yamt

i feel it's too much for a wasm runtime to implement file format detection. although it might be possible to make it work for many cases, "work for many cases" is not reliable enough, IMO. after all, the user should know the format for sure. what's the point to make a guess?

I believe this is exactly because of wasi-nn’s design goal.

“Another design goal is to make the API framework- and model-agnostic; this allows for implementing the API with multiple ML frameworks and model formats. The load method will return an error message when an unsupported model encoding scheme is passed in. This approach is similar to how a browser deals with image or video encoding.”

In other words, wasi-nn is intentionally trying to be model-agnostic, which is why the API does not allow specifying the backend either in load or load_by_name. Because of this, the current behavior requires us to recompile the runtime depending on the model we want to target.

So I think the intention is not to “guess” the format, but to stay consistent with the model-agnostic design and wasi-nn interface, where the runtime determines whether the provided model is supported or not.

ayakoakasaka avatar Dec 05 '25 10:12 ayakoakasaka

i feel it's too much for a wasm runtime to implement file format detection. although it might be possible to make it work for many cases, "work for many cases" is not reliable enough, IMO. after all, the user should know the format for sure. what's the point to make a guess?

I believe this is exactly because of wasi-nn’s design goal.

“Another design goal is to make the API framework- and model-agnostic; this allows for implementing the API with multiple ML frameworks and model formats. The load method will return an error message when an unsupported model encoding scheme is passed in. This approach is similar to how a browser deals with image or video encoding.”

In other words, wasi-nn is intentionally trying to be model-agnostic, which is why the API does not allow specifying the backend either in load or load_by_name. Because of this, the current behavior requires us to recompile the runtime depending on the model we want to target.

So I think the intention is not to “guess” the format, but to stay consistent with the model-agnostic design and wasi-nn interface, where the runtime determines whether the provided model is supported or not.

sorry, i don't understand your logic.

being model-agnostic doesn't imply auto-detection at all. load actually takes encoding explicitly. for load-by-name, the encoding is usually specified by runtime-specific api. (wamr is an exception here, and i consider it a bug.)

you are right you need to enable the backend for your model. but this PR doesn't change it at all.

i vaguely remember that there was a discussion about standardizing the model format. but you are not talking about it, do you?

yamt avatar Dec 08 '25 04:12 yamt

i feel it's too much for a wasm runtime to implement file format detection. although it might be possible to make it work for many cases, "work for many cases" is not reliable enough, IMO. after all, the user should know the format for sure. what's the point to make a guess?

I believe this is exactly because of wasi-nn’s design goal.

“Another design goal is to make the API framework- and model-agnostic; this allows for implementing the API with multiple ML frameworks and model formats. The load method will return an error message when an unsupported model encoding scheme is passed in. This approach is similar to how a browser deals with image or video encoding.”

In other words, wasi-nn is intentionally trying to be model-agnostic, which is why the API does not allow specifying the backend either in load or load_by_name. Because of this, the current behavior requires us to recompile the runtime depending on the model we want to target. So I think the intention is not to “guess” the format, but to stay consistent with the model-agnostic design and wasi-nn interface, where the runtime determines whether the provided model is supported or not.

sorry, i don't understand your logic.

being model-agnostic doesn't imply auto-detection at all. load actually takes encoding explicitly. for load-by-name, the encoding is usually specified by runtime-specific api. (wamr is an exception here, and i consider it a bug.)

you are right you need to enable the backend for your model. but this PR doesn't change it at all.

i vaguely remember that there was a discussion about standardizing the model format. but you are not talking about it, do you?

That makes sense. I didn’t look very closely at load because it consumes a significant amount of Wasm memory for the model, so I decided not to use it. I noticed that encode exists there, but since it is not available in load_by_name, I mistakenly assumed that the inability to specify encode meant the API was model-agnostic.

However, I still don’t have a clear understanding of how load_by_name is supposed to be used. Ideally, I would like to be able to specify the same parameters as load, but because the behavior is runtime-specific, this becomes difficult in practice.

For now, @dongsheng28849455 team are considering extending load_by_name so that the target can be specified explicitly, and following the same approach for encode as well. autodetect does not seem to be a reliable or robust solution for this use case.

ayakoakasaka avatar Dec 12 '25 10:12 ayakoakasaka

i feel it's too much for a wasm runtime to implement file format detection. although it might be possible to make it work for many cases, "work for many cases" is not reliable enough, IMO. after all, the user should know the format for sure. what's the point to make a guess?

I believe this is exactly because of wasi-nn’s design goal.

“Another design goal is to make the API framework- and model-agnostic; this allows for implementing the API with multiple ML frameworks and model formats. The load method will return an error message when an unsupported model encoding scheme is passed in. This approach is similar to how a browser deals with image or video encoding.”

In other words, wasi-nn is intentionally trying to be model-agnostic, which is why the API does not allow specifying the backend either in load or load_by_name. Because of this, the current behavior requires us to recompile the runtime depending on the model we want to target. So I think the intention is not to “guess” the format, but to stay consistent with the model-agnostic design and wasi-nn interface, where the runtime determines whether the provided model is supported or not.

sorry, i don't understand your logic. being model-agnostic doesn't imply auto-detection at all. load actually takes encoding explicitly. for load-by-name, the encoding is usually specified by runtime-specific api. (wamr is an exception here, and i consider it a bug.) you are right you need to enable the backend for your model. but this PR doesn't change it at all. i vaguely remember that there was a discussion about standardizing the model format. but you are not talking about it, do you?

That makes sense. I didn’t look very closely at load because it consumes a significant amount of Wasm memory for the model, so I decided not to use it. I noticed that encode exists there, but since it is not available in load_by_name, I mistakenly assumed that the inability to specify encode meant the API was model-agnostic.

However, I still don’t have a clear understanding of how load_by_name is supposed to be used. Ideally, I would like to be able to specify the same parameters as load, but because the behavior is runtime-specific, this becomes difficult in practice.

iirc, the original motivation of load-by-name was to reduce overhead by using preloaded models. consider a server plugin instantiated on each requests. it isn't desirable to execute expensive operations like load on each requests.

For now, @dongsheng28849455 team are considering extending load_by_name so that the target can be specified explicitly, and following the same approach for encode as well. autodetect does not seem to be a reliable or robust solution for this use case.

yamt avatar Dec 17 '25 04:12 yamt